[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference
New submission from Liu DongMiao : HTMLParser (Python 2.6.2) Cannot deal with mixture of arbitrary data and character reference. In line 365-373, replaceEntities(s) returns unichr(charref) in unicode, which cannot be a mixture with arbitrary data in str. A fix way: replace unichr(c) with unichr(c).encode('utf-8'). -- components: Library (Lib) files: chinese.py messages: 91128 nosy: liudongm...@gmail.com severity: normal status: open title: HTMLParser cannot deal with mixture of arbitrary data and character reference type: compile error versions: Python 2.6 Added file: http://bugs.python.org/file14613/chinese.py ___ Python tracker <http://bugs.python.org/issue6611> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference
Liu DongMiao added the comment: i think this should not be a bug. as we dont know the encoding of str, so we cannt deal with str and unicode together. in my example, str is in utf-8, so i need to convert unicode to str in utf-8. i will takes bones' suggestion. -- status: open -> closed type: compile error -> behavior ___ Python tracker <http://bugs.python.org/issue6611> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com