Project

General

Profile

Regex: Include word1, exclude word2

Added by Dirk Diggler over 4 years ago

Hello,

i'd like to create a timer which matches with "word1" but shall exclude from within those matches if there is a "word2".
How can i do this?

DocMAX


Replies (13)

RE: Regex: Include word1, exclude word2 - Added by Paraic McDonagh over 4 years ago

Here you go:

^(?!.*word2).*word1.*$

Example:

^(?!.*Volcano).*Horizon.*$

This will record all episodes of Horizon that don't contain the word Volcano.
So 'Horizon: How the Universe began' will be recorded BUT
'Horizon: The anatomy of a Volcano' will not.

RE: Regex: Include word1, exclude word2 - Added by Wim K about 4 years ago

Hi,

I've been trying for hours now, but I still can't find the answer to a similar question.
I want to record every broadcast starting with

Formule 1

but not ending with

(Herh)

or

(Samenvatting)

To me it looks like no online regex testers works with the same type of Regex that tvheadend is using.

I don't think that it's related to the above problem, but I also have this issue too:
https://tvheadend.org/boards/5/topics/22495

Wim

RE: Regex: Include word1, exclude word2 - Added by Em Smith about 4 years ago

Something similar works for me. There are different types of regular expression, so perhaps your version of tvheadend is not built with PCRE (Perl compatible regular expressions), which supports all these advanced regex (search "pcresyntax man page" for all the syntax).

Go to tab "About", "Toggle Details", and scroll down and you should see "pcre yes" or maybe "pcre2 yes" and further down should have "libpcre [some numbers]".

Does your tvheadend have that?

RE: Regex: Include word1, exclude word2 - Added by Wim K about 4 years ago

Em Smith wrote:

Does your tvheadend have that?

Thanks for the help! No, my tvheadend doesn't have any of those.
I guess I have to add those to
AUTOBUILD_CONFIGURE_EXTRA=--

What exactly should I add?

Wim

RE: Regex: Include word1, exclude word2 - Added by Wim K about 4 years ago

BTW, I'm on Build: 4.2.3 (2017-09-08T20:57:44+0200).
Is this supprted only on 4.3? That's not a stable build yet.

RE: Regex: Include word1, exclude word2 - Added by Em Smith about 4 years ago

Sorry, I don't know the version number, I can tell you it was implemented in May this year, but although you have a later version it looks to me like it was on the 4.3 branch and not in 4.2. I don't know if it will be in a later 4.2 update.

git log -Spcre  origin/release/4.2 -- configure

vs
git log -Spcre  origin/master  -- configure
    DVR: add PCRE2 support
    DVR: Add PCRE support

The easiest way to check is with configure:

./configure --help | grep pcre
  --disable-pcre                 Disable pcre
  --enable-pcre                  Enable pcre
  --disable-pcre2                Disable pcre2
  --enable-pcre2                 Enable pcre2

It should automatically pick up pcre if it's on your box. I'm not sure your OS, but on a debian/ubuntu derived system:

dpkg -l libpcre* | grep ^ii

should come up with a list of files and "-dev" should be one of them.

And this should show some files ending in .h:

echo /usr/include/pcre*

If you want the change, you can possibly just "git cherry-pick" the two commits. They look self contained to me.

In the meantime, you can always try a more old fashioned regex such as this might work:

Formula 1.*[^(][^H][^e][^r][^h][^)]$

RE: Regex: Include word1, exclude word2 - Added by Wim K about 4 years ago

Thanks for the great answers!

I'm on raspbian by the way.

./configure --help | grep pcre
Shows nothing.

dpkg -l libpcre* | grep ^ii:
ii libpcre16-3:armhf 2:8.38-3.1+0~20160728135918.4+jessie~1.gbpa65687 armhf Perl 5 Compatible Regular Expression Library - 16 bit runtime files
ii libpcre3:armhf 2:8.38-3.1+0~20160728135918.4+jessie~1.gbpa65687 armhf Perl 5 Compatible Regular Expression Library - runtime files
ii libpcre3-dev:armhf 2:8.38-3.1+0~20160728135918.4+jessie~1.gbpa65687 armhf Perl 5 Compatible Regular Expression Library - development files
ii libpcre32-3:armhf 2:8.38-3.1+0~20160728135918.4+jessie~1.gbpa65687 armhf Perl 5 Compatible Regular Expression Library - 32 bit runtime files
ii libpcrecpp0:armhf 2:8.38-3.1+0~20160728135918.4+jessie~1.gbpa65687 armhf Perl 5 Compatible Regular Expression Library - C++ runtime files

echo /usr/include/pcre*
/usr/include/pcrecpparg.h /usr/include/pcrecpp.h /usr/include/pcre.h /usr/include/pcreposix.h /usr/include/pcre_scanner.h /usr/include/pcre_stringpiece.h

Tried this:Formule 1.*[^(][^H][^e][^r][^h][^)]$
Works great! That will do for now, but I will try the cherry-pick.

You solved my problem. Thanks for all the help!

Wim

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

It's been a while since last post. It's relevant again because my provider changed the epg-data in a way that I now need to use regex and the option full-text to search the description.
In the past I build/compiled tvheadend, but I found it very difficult so at the moment I install tvheadend with sudo apt-get install tvheadend. I guess that's why "About", "Toggle Details", doesn't give me "pcre yes", nor maybe "pcre2 yes" nor "libpcre [some numbers]".

dpkg [-]l libpcre* | grep ^ii"
ii libpcre16-3:armhf 2:8.39-12 armhf Old Perl 5 Compatible Regular Expression Library - 16 bit runtime files
ii libpcre2-16-0:armhf 10.32-5 armhf New Perl Compatible Regular Expression Library - 16 bit runtime files
ii libpcre2-8-0:armhf 10.32-5 armhf New Perl Compatible Regular Expression Library- 8 bit runtime files
ii libpcre2-posix0:armhf 10.32-5 armhf New Perl Compatible Regular Expression Library - posix-compatible runtime files
ii libpcre3:armhf 2:8.39-12 armhf Old Perl 5 Compatible Regular Expression Library - runtime files
ii libpcre3-dev:armhf 2:8.39-12 armhf Old Perl 5 Compatible Regular Expression Library - development files
ii libpcre32-3:armhf 2:8.39-12 armhf Old Perl 5 Compatible Regular Expression Library - 32 bit runtime files
ii libpcrecpp0v5:armhf 2:8.39-12 armhf Old Perl 5 Compatible Regular Expression Library - C++ runtime files

echo /usr/include/pcre*:
/usr/include/pcrecpparg.h /usr/include/pcrecpp.h /usr/include/pcre.h /usr/include/pcreposix.h /usr/include/pcre_scanner.h /usr/include/pcre_stringpiece.h

Having installed tvheadend with apt-get install, is there an easy way to be able to use regex in tvheadend (in my Pi4)?

Greetings,

Wim
ps Using HTS Tvheadend 4.2.8-34~g24a2f59e9

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

I tried a simple regex but as soon as I click apply and save, the regex definition is gone. The field is blank. That doesn't happen with other regex expressions like Formule 1.*[(][^H][^e][^r][^h][^)]$ only with negative lookahead.

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

Okay I read up on how to build and now what to do to include pcre.

But is it necessary? Do the packages in the apt repository not "contain support" for pcre?

If I have to build. Do I have to uninstall the current installation first? Everything else works and I don't want to break anything.

Wim

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

Okay, managed to install Tvheadend 4.3-1964~g637844055 with PCRE2 as seen in details.

Now trying to record "Bundesliga" but not the ones with "Samenvatting" in description (extra text).
Regex is checked in regextester and is accepted by tvheadend: ^(?=.*?\bBundesliga\b)((?!Samenvatting).)*$

But still all broadcasts with "Samenvatting" are being programmed.

What am I doing wrong?

Please help.

Wim
EDIT: fulltext option is checked

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

^(?!.*Verslag).*$ or ^(?!Verslag).*$ with fulltext option enabled still record shows with description "Verslag...."
What am I doing wrong?

RE: Regex: Include word1, exclude word2 - Added by Wim K 5 months ago

I just read this:
https://tvheadend.org/issues/4380#note-4

If this: "The fulltext matches for 'title', 'subtitle', 'summary', 'description' separately (it means if the description (example) is 'ok' for the regex match, it succeeds)."
is true, negative lookahead never functions unless title, subtitle, summary and description are exactly the same which is probably never.
No way to improve this?

Or am I seeing this wrong? The suggestion in the second post of this thread would be wrong too.

    (1-13/13)