[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

Terry J. Reedy Mon, 20 May 2013 12:53:10 -0700

Terry J. Reedy added the comment:

3.3 shifted the wide-build problem to all builds ;-). I now get


  File "C:\Python\mypy\tem.py", line 4, in <module>
    xmlet.fromstring(s)
  File "C:...33\lib\xml\etree\ElementTree.py", line 1356, in XML
    parser.feed(text)
  File "<string>", line None
xml.etree.ElementTree.ParseError: unknown encoding: line 1, column 30

I do not understand the 'unknown encoding' bit. Replacing 'GBK' with a truly 
unknown encoding changes the last line to
LookupError: unknown encoding: xyz, so the lookup of 'GBK' succeeded.

I get the same two messages if I add a 'b' prefix to make s be bytes, which it 
logically should be (and was in 2.7). (I presume .fromstring 'encodes' unicode 
input to bytes with the ascii or latin-1 encoder and then decodes back to 
unicode according to the announced encoding.)

With s so prefixed, s.decode(encoding="GBK") works and returns the original 
unicode version of s, so Python does know "GBK". And it indeed is in the list 
of official IANA charset names.

I don't know unicode internals to understand Amaury's comment. However, it 
almost reads to me as if this is a unicode bug, not ET bug.

----------
versions:  -Python 3.2

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13612>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13612] xml.etree.ElementTree says unknown encoding of a regular encoding

Reply via email to