Jim Hague [Thu, 21 Dec 2017 10:47:22 +0000 (10:47 +0000)]
eit: rework EIT scraper test script and add POSIX matching (#4801)
Add support for new_title and new_summary test fields, and make adding new fields easier in the future.
Rework regex handling to carry regexp engine type info with the regex. The the PyPi package 'regex' is available, then use that and set its POSIX flag when evaluating POSIX regexes. This doesn't restrict the regex to POSIX-only expressions, but does do POSIX-style leftmost-longest matching, which is the significant behaviour different between PCRE and POSIX expressions.
Jim Hague [Sun, 17 Dec 2017 00:48:23 +0000 (00:48 +0000)]
eit: add title and summary scrapers (#4801)
Since this change adds a summary scraper, remove the recently added
summary update from the second match subgroup and instead build the
match from each scraper by concatenating all matching subgroups. This
lets us pick multiple items from the input.
Jim Hague [Sat, 16 Dec 2017 20:59:16 +0000 (20:59 +0000)]
eit: if PCRE/PCRE2 in use, regexes can be marked for Posix engine execution only (#4795)
If fancier regex engines are available, we need to be able to mark regexes
that should only ever be executed by the Posix engine, to make sure that
they will always work as expected.
If PCRE or PCRE2 is available, look for regexes specific to those. These
have the same name, but are under a map named "pcre" or "pcre2". If they
are not found, fall back to the top level Posix regexes, but make sure
these are executed by the Posix engine.
Jim Hague [Thu, 14 Dec 2017 14:15:11 +0000 (14:15 +0000)]
eit: extend generic regex handling for subpatterns and use in scraper regex (#4795)
Currently scraper regex usage is hardwired to Posix. Using PCRE/PCRE2 if
available would give more flexibility and potentially save repetition in
patterns, e.g.
(?:[.][.][.][:.]*[.:]|[0-9]+/[0-9]+[.])? ([^:]*):
would require multiple Posix patterns, each duplicating the captured
subpattern.
So add regex_match_substring() and regex_match_substring_length() to
the TVH regex interface. Also add a flags parameter to regex_compile(),
so caseless matching can be optionally requested, rather than hardwired
as at present (EIT scraper regex does not use caseless).
One small change to EIT scraper processing. If the match does not fit
into the buffer, it will be ignored, rather than (as at present)
truncated. This is slightly simpler to implement with PCRE2. I am not
convinced truncation is useful - or, for that matter, that trimming space
from the right hand end of match in the EIT scraper is necessary or
necessarily desirable, but I've left that in.
Bernd Kuhls [Mon, 18 Dec 2017 19:52:06 +0000 (20:52 +0100)]
sbuf: fix uclibc compilation error
Fixes build error
tvheadend-e06ffd87beff16103c47d6fa542df2374fca6fd3/src/sbuf.h:77:1:
error: unknown type name 'ssize_t'; did you mean 'size_t'?
ssize_t sbuf_read(sbuf_t *sb, int fd);
pablozg [Tue, 19 Dec 2017 11:18:07 +0000 (12:18 +0100)]
DVR: add new features
Now the autorec name is by default the epg title.
A new button to show / hidde the skipped recordings in the webui.
A new button to add as completed an upcoming recording to avoid record it again.
E.Smith [Tue, 21 Nov 2017 11:03:13 +0000 (11:03 +0000)]
xmltv: Use epggrab_module_int_t instead of ext_t. (#3753).
The epggrab_module_ext_t derives from the epggrab_module_int_t
so we should really use the epggrab_module_int_t to make it
clearer that the fields are in the base class.
E.Smith [Mon, 20 Nov 2017 22:36:43 +0000 (22:36 +0000)]
xmltv: Optionally disable mapping category to genre. (#3753).
Allow user to disable mapping from xmltv to genre. The mapping
is imprecise and often has numerous categories not mapped.
By not mapping to genres, some GUIs can pass through the
category instead.
Jim Hague [Wed, 13 Dec 2017 21:37:00 +0000 (21:37 +0000)]
eit: Add optional 2nd match subexpression for subtitle (#4791)
If the regex for the subtitle contains a second subexpression, and a
match is made, use the first subexpression for the subtitle and replace
the summary with the second subexpression.
For example, a UK Freeview subtitle regex might choose, when matching a
summary 'Subtitle: Text', to set the subtitle to 'Subtitle' and set the
summary to 'Text' to avoid repetition of the subtitle.
Update the scraper test script to support a test field 'new_summary'. As
the 'uk' scraper does not include any second subexpressions, do not
update the test data for now.
Jim Hague [Tue, 12 Dec 2017 21:08:29 +0000 (21:08 +0000)]
eit: Allow empty match subexpressions (#4787)
If a scrape regex includes a subexpression matching the null string (),
this match is treated as if the regex did not match.
Amend this to return an empty string as the match; this is plainly what
the regex author wanted.
As an example of why this might be wanted, consider the UK Freeview
extraction of a subtitle from the summary. A user might wish to specify
the subtitle is left blank if not obvious subtitle is present in the
summary.
E.Smith [Tue, 5 Dec 2017 16:43:52 +0000 (16:43 +0000)]
dvr: Add autorec for new-only. (#1167).
Previously we had "all", "new/unknown", and "repeat", but
no ability to only record episodes marked as "new". So we
rename DVR_AUTOREC_BTYPE_NEW to DVR_AUTOREC_BTYPE_NEW_OR_UNKNOWN
to remain backward compatibility with existing autorec
rules and add new semantics for DVR_AUTOREC_BTYPE_NEW.
We don't update htsp since it currently does not send the
broadcast type field.
Also alter DVR_AUTOREC_BTYPE_NEW_OR_UNKNOWN since
previously we never checked 'new' but instead checked 'repeat'.
However, SD has a previously-shown for all programmes (even first
showings) which causes us to mark programmes as repeat.
It is difficult to fix the repeat logic without breaking existing
behaviour since in the US a programme can be a premiere but have
a previously-shown of the previous day due to timezone differences
on the coasts. Similarly, programmes can be premiere outside the US
but have a previously shown date from the US or from a different channel.
For that reason we now check 'new' instead of 'repeat'.
Real example: Programme is shown on channel A at 9pm and on A+1 timeshift
channel at 10pm. Both are marked as "new" in the paper/OTA tv guide. However,
the programme was actually first shown three years ago on a premium
channel, so it's actually also a repeat since it has been shown before.
So the programme is both a new episode and a repeat episode.
Similarly, one of my tv channel insists Roger Moore Bond films from the
1970s are "new" even though most people would consider them a repeat,
but since it's the first time that particular channel has aired it
they use the "new" tag.
E.Smith [Tue, 5 Dec 2017 20:11:10 +0000 (20:11 +0000)]
ui: Allow filtering/autorec from EPG by category. (#4777).
If we have categories on the server (from xmltv) then
we create a second toolbar on the EPG and add filters for
filtering by category. These are then included in the
autorec rule created from the EPG.
We use a second toolbar since the primary toolbar is a
too cramped to fit more search drop-down boxes.
E.Smith [Tue, 5 Dec 2017 20:09:30 +0000 (20:09 +0000)]
dvr: Fix autorec if it has a category but event has no category. (#4777).
Although we use a drop-down list for autorec categories, if the
user has no categories enabled (such as OTA) and creates an autorec
with a category then it would match all events.
Now we fix it that events without a category can never match an
autorec with categories.