Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Hi, Le samedi 09 janvier 2010 13:45:58, vous avez écrit : > > Note: I implemented the BOM check in TextIOWrapper; so it's already > > usable for any file-like object. > > Yes, but the implementation is limited to just BOM checking > and thus only supports UTF-8-SIG, UTF-16 and UTF-32. Sure, but

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit : > What about listing the possible encodings? It would try each in turn > until it found one where the BOM matched or had no BOM: > > my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8') > > or is that taking it too far? Yes, you'

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 01:47:38, vous avez écrit : > One concern I have with this implementation encoding="BOM" is that if > there is no BOM it assumes UTF-8. If no BOM is found, it fallback to the current heuristic: os.device_encoding() or system local. > (...) Hence, it might be that someon

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread M.-A. Lemburg
Victor Stinner wrote: > (2) Check for a BOM while reading or detect it before? > > Everybody agree that checking BOM is an interesting option and should not be > limited to open(). > > Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file > name or a binary file-like obje

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit : > While I would support combining BOM detection in the case where a file > is opened for reading and no encoding is specified, I see two problems: > a) if a seek operations is performed before having looked at the BOM, >no determinat

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Walter Dörwald
On 09.01.10 01:47, Glenn Linderman wrote: > On approximately 1/8/2010 3:59 PM, came the following characters from > the keyboard of Victor Stinner: >> Hi, >> >> Thanks for all the answers! I will try to sum up all ideas here. > > One concern I have with this implementation encoding="BOM" is that

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Lennart Regebro
It seems to me that when opening a file, the following is the only flow that makes sense for the typical opening of a file flow: if encoding is not None: use encoding elif file has BOM: use BOM else: use system default And hence a encoding='BOM' isn't needed there. Although I'm trying to

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 5:12 PM, came the following characters from the keyboard of MRAB: Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One co

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Martin v. Löwis
>>> Antoine would like to check BOM by default, because both options >>> (system locale vs checking for BOM) is the same thing. >>> >> To be clear, I am not saying it is the same thing. What I think is >> that it would be a mistake to use a mildly unreliable heuristic by >> default (the locale +

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread MRAB
Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes U

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes UTF-8. That is probably

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Michael Foord
On 09/01/2010 00:09, Antoine Pitrou wrote: Hello Victor, Victor Stinner haypocalc.com> writes: (1) Change default open() behaviour or make it optional? [...] Antoine would like to check BOM by default, because both options (system locale vs checking for BOM) is the same thing

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Antoine Pitrou
Hello Victor, Victor Stinner haypocalc.com> writes: > > (1) Change default open() behaviour or make it optional? > [...] > > Antoine would like to check BOM by default, because both options (system > locale vs checking for BOM) is the same thing. To be clear, I am not saying it is the same t

[Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Victor Stinner
Hi, Thanks for all the answers! I will try to sum up all ideas here. (1) Change default open() behaviour or make it optional? Guido would like to add an option and keep open() unchanged. He wrote that checking for BOM and using system locale are too much different to be the same option (encod