Hi. It seems the problem still happens with v.0.99 (from a pending upload package prepared for experimental) :
$ python Python 2.7.5+ (default, Sep 17 2013, 17:31:54) [GCC 4.8.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import html5lib >>> html5lib.parse('foo\bfoo', treebuilder='lxml') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 28, in parse return p.parse(doc, encoding=encoding) File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 224, in parse parseMeta=parseMeta, useChardet=useChardet) File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 93, in _parse self.mainLoop() File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 183, in mainLoop new_token = phase.processCharacters(new_token) File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 991, in processCharacters self.tree.insertText(token["data"]) File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/_base.py", line 320, in insertText parent.insertText(data) File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/etree_lxml.py", line 240, in insertText builder.Element.insertText(self, data, insertBefore) File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/etree.py", line 108, in insertText self._element.text += data File "lxml.etree.pyx", line 921, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:41264) File "apihelpers.pxi", line 652, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:18755) File "apihelpers.pxi", line 1335, in lxml.etree._utf8 (src/lxml/lxml.etree.c:24545) ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters >>> olivier@inf-8660:~/svn/svn.debian.org/python-modules/packages/build-area$ dpkg -l python-html5lib Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder | État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=attend-traitement-déclenchements |/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais) ||/ Nom Version Architecture Description +++-===========================================-==========================-==========================-=========================================================================================== ii python-html5lib 0.99-1 all HTML parser/tokenizer based on the WHATWG HTML5 specification Are you sure this is a bug ? Would you mind checking with upstream and/or forwarding the issue there ? Best regards, -- Olivier BERGER (OpenPGP: 4096R/7C5BB6A5 : http://weusepgp.info) http://www.olivierberger.com/weblog/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org