Project

General

Profile

Bug #5174

subtitle scraping wrong (double subtitle) if text contains apex character '

Added by g siviero about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
EPG - Grabbers
Target version:
-
Start date:
2018-08-13
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3-1292
Affected Versions:

Description

In some circumstances the subtitle is "doubled". This seems to happen when the subtitle is contained between apex characters (but this could be an unrelated coincidence).

issue:

Aug 13 10:23:42  tvheadend[1453]: tbl-eit: svc='DMAX', ch='DMAX', eid=21815, tbl=52, running=0, start=2018-08-25;01:55:00(+0200), stop=2018-08-25;02:45:00(+0200), ebc=0x123ed28
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: eit:  dtag 4D dlen 225
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 69 74 61 0E 41 20 6D 61 72 69 20 65 73 74 72 65 ita.A mari estre
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 6D 69 CE 53 74 2E 31 20 45 70 2E 31 20 27 49 6E mi.St.1 Ep.1 'In
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 20 6D 61 72 65 20 61 70 65 72 74 6F 27 20 2D 20  mare aperto' -
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 4C 61 20 70 65 73 63 61 20 69 6E 20 6D 61 72 65 La pesca in mare
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 20 61 70 65 72 74 6F 20 65 27 20 75 6E 6F 20 64  aperto e' uno d
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 65 69 20 6C 61 76 6F 72 69 20 70 69 75 27 20 64 ei lavori piu' d
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 69 66 66 69 63 69 6C 69 20 69 6E 20 47 72 61 6E ifficili in Gran
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 20 42 72 65 74 61 67 6E 61 2E 20 4C 61 20 76 69  Bretagna. La vi
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 74 61 20 61 20 62 6F 72 64 6F 20 6E 6F 6E 20 65 ta a bordo non e
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 27 20 63 6F 6D 70 6C 69 63 61 74 61 20 73 6F 6C ' complicata sol
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 6F 20 70 65 72 20 6C 65 20 6D 61 74 72 69 63 6F o per le matrico
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 6C 65 3A 20 74 75 74 74 61 20 6C 61 20 70 72 65 le: tutta la pre
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 73 73 69 6F 6E 65 20 72 69 63 61 64 65 20 73 75 ssione ricade su
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 6C 6C 6F 20 73 6B 69 70 70 65 72 20 50 68 69 6C llo skipper Phil
Aug 13 10:23:42  tvheadend[1453]: tbl-eit: 2E 

Season extraction (ok with modified code)

Aug 13 10:23:42  tvheadend[1453]: epggrab:   pattern "\[?St\.([0-9]+)\]?" matches '1' from 'St.1 Ep.1 'In mare aperto' - La pesca in mare aperto e' uno dei lavori piu' difficili in Gran Bretagna. La vita a bordo non e' complicata solo per le matricole: tutta la pressione ricade sullo skipper Phil.'
Aug 13 10:23:42  tvheadend[1453]: tbl-eit:   extract season number 1 using eit

Episode extraction (ok with modified code)

Aug 13 10:23:42  tvheadend[1453]: epggrab:   pattern " ?[Ee]p\.? ?([0-9]+)" matches '1' from 'St.1 Ep.1 'In mare aperto' - La pesca in mare aperto e' uno dei lavori piu' difficili in Gran Bretagna. La vita a bordo non e' complicata solo per le matricole: tutta la pressione ricade sullo skipper Phil.'
Aug 13 10:23:42  tvheadend[1453]: tbl-eit:   extract episode number 1 using eit

Subtitle extraction

Aug 13 10:23:42  tvheadend[1453]: epggrab:   pattern "Ep[.] ?[0-9]+[A-Za-z]? -? ?'(([^']*(' [^A-Z0-9-])?('[^ '])?)+)'" matches 'In mare apertoIn mare aperto' from 'St.1 Ep.1 'In mare aperto' - La pesca in mare aperto e' uno dei lavori piu' difficili in Gran Bretagna. La vita a bordo non e' complicata solo per le matricole: tutta la pressione ricade sullo skipper Phil.'
Aug 13 10:23:42  tvheadend[1453]: tbl-eit:   scrape subtitle 'In mare apertoIn mare aperto' from 'St.1 Ep.1 'In mare aperto' - La pesca in mare aperto e' uno dei lavori piu' difficili in Gran Bretagna. La vita a bordo non e' complicata solo per le matricole: tutta la pressione ricade sullo skipper Phil.' using eit

the issue seems to come from lines 141-146 in src/epggrab/module/eitpatternlist.c

      for (matchno = 2; ; ++matchno) {
        if (regex_match_substring(&p->compiled, matchno, matchbuf, sizeof(matchbuf)))
          break;
        size_t len = strlen(buf);
        strlcat(buf, matchbuf, size_buf - len);
      }

Also available in: Atom PDF