Package: python-html5lib Version: 0.90-2 Severity: normal
lxml builder raises an exception when parsing a string with control characters:
import html5lib html5lib.parse('foo\bfoo', treebuilder='lxml')
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 38, in parse return p.parse(doc, encoding=encoding) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 211, in parse parseMeta=parseMeta, useChardet=useChardet) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 111, in _parse self.mainLoop() File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 174, in mainLoop self.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 572, in processCharacters self.parser.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 611, in processCharacters self.parser.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 652, in processCharacters self.parser.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 711, in processCharacters self.parser.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 804, in processCharacters self.parser.phase.processCharacters(token) File "/usr/lib/pymodules/python2.7/html5lib/html5parser.py", line 948, in processCharacters self.tree.insertText(token["data"]) File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/_base.py", line 288, in insertText parent.insertText(data) File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree_lxml.py", line 225, in insertText builder.Element.insertText(self, data, insertBefore) File "/usr/lib/pymodules/python2.7/html5lib/treebuilders/etree.py", line 114, in insertText self._element.text += data File "lxml.etree.pyx", line 904, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:37110) File "apihelpers.pxi", line 721, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:16855) File "apihelpers.pxi", line 1366, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22060) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters -- System Information: Debian Release: wheezy/sid APT prefers unstable APT policy: (990, 'unstable'), (500, 'experimental') Architecture: i386 (x86_64) Kernel: Linux 3.3.0-trunk-amd64 (SMP w/2 CPU cores) Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages python-html5lib depends on: ii python 2.7.2-10 ii python-support 1.0.14 Versions of packages python-html5lib suggests: ii python-beautifulsoup <none> ii python-chardet 2.0.1-2 ii python-genshi <none> ii python-lxml 2.3.2-1 -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org