[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

2009-07-31 Thread Liu DongMiao

New submission from Liu DongMiao :

HTMLParser (Python 2.6.2) Cannot deal with mixture of arbitrary data and
character reference. 

In line 365-373, replaceEntities(s) returns unichr(charref) in unicode,
which cannot be a mixture with arbitrary data in str.

A fix way: replace unichr(c) with unichr(c).encode('utf-8').

--
components: Library (Lib)
files: chinese.py
messages: 91128
nosy: liudongm...@gmail.com
severity: normal
status: open
title: HTMLParser cannot deal with mixture of arbitrary data and character 
reference
type: compile error
versions: Python 2.6
Added file: http://bugs.python.org/file14613/chinese.py

___
Python tracker 
<http://bugs.python.org/issue6611>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

2009-08-01 Thread Liu DongMiao

Liu DongMiao  added the comment:

i think this should not be a bug.

as we dont know the encoding of str, so we cannt deal with str and
unicode together. 

in my example, str is in utf-8, so i need to convert unicode to str in
utf-8.

i will takes bones' suggestion.

--
status: open -> closed
type: compile error -> behavior

___
Python tracker 
<http://bugs.python.org/issue6611>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com