It may be possible to implement parsing the codec cookie as a Python codec :-)
Victor 2013/6/18 Guido van Rossum <gu...@python.org>: > On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benja...@python.org> > wrote: >> 2013/6/17 Guido van Rossum <gu...@python.org>: >>> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benja...@python.org> >>> wrote: >>>> 2013/6/17 Greg Ewing <greg.ew...@canterbury.ac.nz>: >>>>> Guido van Rossum wrote: >>>>>> >>>>>> No. Executing a file containing those exact characters produces a >>>>>> string containing only '\n' and exec/eval is meant to behave the same >>>>>> way. The string may not have originated from a file, so the universal >>>>>> newlines behavior of the io module is irrelevant here -- the parser >>>>>> must implement its own equivalent processing, and it does. >>>>> >>>>> >>>>> I'm still not convinced that this is necessary or desirable >>>>> behaviour. I can understand the parser doing this as a >>>>> workaround before we had universal newlines, but now that >>>>> we do, I'd expect any Python string to already have newlines >>>>> converted to their canonical representation, and that any CRs >>>>> it contains are meant to be there. The parser shouldn't need >>>>> to do newline translation a second time. >>>> >>>> It used to be that way until 2.7. People like to do things like >>>> >>>> with open("myfile.py", "rb") as fp: >>>> exec fp.read() in ns >>>> >>>> which used to fail with CRLF newlines because binary mode doesn't have >>>> them. I think this is actually the correct way to execute Python >>>> sources because the parser then handles the somewhat complicated >>>> process of decoding Python source for you. >>> >>> What exactly does the parser handles better than the io module? Is it >>> just the coding cookies? I suppose that works as long as the file is >>> encoded using as ASCII superset like the Latin-N variants or UTF-8. It >>> would fail pretty badly if it was UTF-16 (and yes, that's an >>> abominable encoding for other reasons :-). >> >> The coding cookie is the main one. In fact, if you can't parse that, >> you don't really know what encoding to open the file with at all. >> There's also small things like BOM handling (you have to use the >> utf-16-sig encoding with TextIO to get it removed) and defaulting to >> UTF-8 (which the io module doesn't do) which is better left to the >> parser. > > Maybe there are some lessons here that the TextIO module could learn? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com