M.-A. Lemburg wrote: > On 2007-11-09 14:10, Walter Dörwald wrote: >> Martin v. Löwis wrote: >>>>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc >>>>> codecs to do the encoding. There's no need to create a magical >>>>> mystery codec to pick out which though. >>>> So the code is good, if it is inside an XML parser, and it's bad if it >>>> is inside a codec? >>> Exactly so. This functionality just *isn't* a codec - there is no >>> encoding. Instead, it is an algorithm for *detecting* an encoding. >> And what do you do once you've detected the encoding? You decode the >> input, so why not combine both into an XML decoder? > > FWIW: I'm +1 on adding such a codec. > > It makes working with XML data a lot easier: you simply don't have to > bother with the encoding of the XML data anymore and can just let the > codec figure out the details. The XML parser can then work directly > on the Unicode data.
Exactly. I have a version of sgmlop lying around that does that. > Whether it needs to be in C or not is another question (I would have > done this in Python since performance is not really an issue), but since > the code is already written, why not use it ? Servus, Walter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com