Bug #2644

htsmsg_xml_deserialize fails when <!DOCTYPE> exist in xmltv.xml

Added by dhead 666 over 3 years ago. Updated over 3 years ago.

Status:FixedStart date:2015-01-27
Priority:NormalDue date:
Assignee:Adam Sutton% Done:


Category:EPG - Grabbers
Target version:-
Found in version:git-b98e688f5792a9fb3906491cd51e3b5c62294cd1 Affected Versions:


While testing a network tuner (VBox) I found this issue with the auto generated xmltv.xml which includes <!DOCTYPE>, see attached file.

I'm sending the xmltv.xml through the xmltv.sock with "cat xmltv.xml | socat - UNIX-CONNECT:/home/hts/.hts/tvheadend/epggrab/xmltv.sock"

Log output

tvheadend[4025]: xmltv: htsmsg_xml_deserialize error Unknown syntatic element: <!DOCTYPE tv
tvheadend[4025]: xmltv: failed to read data

xmltv.xml Magnifier (567 KB) dhead 666, 2015-01-27 02:09

Associated revisions

Revision 3c0a2798
Added by Jaroslav Kysela over 3 years ago

xml parser: skip UTF-8 BOM header, fixes #2644


#1 Updated by dhead 666 over 3 years ago

I'm not sure if to call it resolved but the reason for the issue is probably because the file starting with the following hidden characters:

#2 Updated by Jaroslav Kysela over 3 years ago

It looks like UTF-8 BOM: http://unicode.org/faq/utf_bom.html

#3 Updated by Jaroslav Kysela over 3 years ago

Does this help for you ?

diff --git a/src/htsmsg_xml.c b/src/htsmsg_xml.c
index d1ba7d5..81ff53c 100644
--- a/src/htsmsg_xml.c
+++ b/src/htsmsg_xml.c
@@ -833,6 +833,10 @@ htsmsg_xml_deserialize(char *src, char *errbuf, size_t errbufsize)
   xp.xp_encoding = XML_ENCODING_UTF8;

+  /* check for UTF-8 BOM */
+  if(src[0] == 0xef && src[1] == 0xbb && src[2] == 0xbf)
+    memmove(src, src + 3, strlen(src) - 2);
   if((src = htsmsg_parse_prolog(&xp, src)) == NULL)
     goto err;

#4 Updated by dhead 666 over 3 years ago

The patch works perfectly :)

#5 Updated by Jaroslav Kysela over 3 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Also available in: Atom PDF