2013/6/17 Guido van Rossum <gu...@python.org>: > On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benja...@python.org> > wrote: >> 2013/6/17 Guido van Rossum <gu...@python.org>: >>> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benja...@python.org> >>> wrote: >>> What exactly does the parser handles better than the io module? Is it >>> just the coding cookies? I suppose that works as long as the file is >>> encoded using as ASCII superset like the Latin-N variants or UTF-8. It >>> would fail pretty badly if it was UTF-16 (and yes, that's an >>> abominable encoding for other reasons :-). >> >> The coding cookie is the main one. In fact, if you can't parse that, >> you don't really know what encoding to open the file with at all. >> There's also small things like BOM handling (you have to use the >> utf-16-sig encoding with TextIO to get it removed) and defaulting to >> UTF-8 (which the io module doesn't do) which is better left to the >> parser. > > Maybe there are some lessons here that the TextIO module could learn?
UTF-8 by default would be great, but that ship has sailed. Reading Python coding cookies is outside the purview of TextIOWrapper. However, it would be good to have a function in the stdlib to read a python source file to Unicode; I've definitely implemented that several times. -- Regards, Benjamin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com