Package: python-html5lib
Version: 0.90-2
Severity: normal

lxml builder raises an exception when parsing a string with control characters:

import html5lib
html5lib.parse('foo\bfoo', treebuilder='lxml')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 38, in 
parse
     return p.parse(doc, encoding=encoding)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 211, in 
parse
     parseMeta=parseMeta, useChardet=useChardet)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 111, in 
_parse
     self.mainLoop()
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 174, in 
mainLoop
     self.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 572, in 
processCharacters
     self.parser.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 611, in 
processCharacters
     self.parser.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 652, in 
processCharacters
     self.parser.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 711, in 
processCharacters
     self.parser.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 804, in 
processCharacters
     self.parser.phase.processCharacters(token)
   File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 948, in 
processCharacters
     self.tree.insertText(token["data"])
   File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/_base.py", line 
288, in insertText
     parent.insertText(data)
   File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree_lxml.py", 
line 225, in insertText
     builder.Element.insertText(self, data, insertBefore)
   File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree.py", line 
114, in insertText
     self._element.text += data
   File "lxml.etree.pyx", line 904, in lxml.etree._Element.text.__set__ 
(src/lxml/lxml.etree.c:37110)
   File "apihelpers.pxi", line 721, in lxml.etree._setNodeText 
(src/lxml/lxml.etree.c:16855)
   File "apihelpers.pxi", line 1366, in lxml.etree._utf8 
(src/lxml/lxml.etree.c:22060)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes 
or control characters


-- System Information:
Debian Release: wheezy/sid
   APT prefers unstable
   APT policy: (990, 'unstable'), (500, 'experimental')
Architecture: i386 (x86_64)

Kernel: Linux 3.3.0-trunk-amd64 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-html5lib depends on:
ii  python          2.7.2-10
ii  python-support  1.0.14

Versions of packages python-html5lib suggests:
ii  python-beautifulsoup  <none>
ii  python-chardet        2.0.1-2
ii  python-genshi         <none>
ii  python-lxml           2.3.2-1

--
Jakub Wilk



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to