Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Hi, Le samedi 09 janvier 2010 13:45:58, vous avez écrit : > > Note: I implemented the BOM check in TextIOWrapper; so it's already > > usable for any file-like object. > > Yes, but the implementation is limited to just BOM checking > and thus only supports UTF-8-SIG, UTF-16 and UTF-32. Sure, but

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit : > What about listing the possible encodings? It would try each in turn > until it found one where the BOM matched or had no BOM: > > my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8') > > or is that taking it too far? Yes, you'

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 01:47:38, vous avez écrit : > One concern I have with this implementation encoding="BOM" is that if > there is no BOM it assumes UTF-8. If no BOM is found, it fallback to the current heuristic: os.device_encoding() or system local. > (...) Hence, it might be that someon

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread M.-A. Lemburg
Victor Stinner wrote: > (2) Check for a BOM while reading or detect it before? > > Everybody agree that checking BOM is an interesting option and should not be > limited to open(). > > Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file > name or a binary file-like obje

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Victor Stinner
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit : > While I would support combining BOM detection in the case where a file > is opened for reading and no encoding is specified, I see two problems: > a) if a seek operations is performed before having looked at the BOM, >no determinat

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-09 Thread Walter Dörwald
On 09.01.10 01:47, Glenn Linderman wrote: > On approximately 1/8/2010 3:59 PM, came the following characters from > the keyboard of Victor Stinner: >> Hi, >> >> Thanks for all the answers! I will try to sum up all ideas here. > > One concern I have with this implementation encoding="BOM" is that

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Lennart Regebro
It seems to me that when opening a file, the following is the only flow that makes sense for the typical opening of a file flow: if encoding is not None: use encoding elif file has BOM: use BOM else: use system default And hence a encoding='BOM' isn't needed there. Although I'm trying to

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 5:12 PM, came the following characters from the keyboard of MRAB: Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One co

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Martin v. Löwis
>>> Antoine would like to check BOM by default, because both options >>> (system locale vs checking for BOM) is the same thing. >>> >> To be clear, I am not saying it is the same thing. What I think is >> that it would be a mistake to use a mildly unreliable heuristic by >> default (the locale +

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread MRAB
Glenn Linderman wrote: On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes U

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Glenn Linderman
On approximately 1/8/2010 3:59 PM, came the following characters from the keyboard of Victor Stinner: Hi, Thanks for all the answers! I will try to sum up all ideas here. One concern I have with this implementation encoding="BOM" is that if there is no BOM it assumes UTF-8. That is probably

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Michael Foord
On 09/01/2010 00:09, Antoine Pitrou wrote: Hello Victor, Victor Stinner haypocalc.com> writes: (1) Change default open() behaviour or make it optional? [...] Antoine would like to check BOM by default, because both options (system locale vs checking for BOM) is the same thing

Re: [Python-Dev] Quick sum up about open() + BOM

2010-01-08 Thread Antoine Pitrou
Hello Victor, Victor Stinner haypocalc.com> writes: > > (1) Change default open() behaviour or make it optional? > [...] > > Antoine would like to check BOM by default, because both options (system > locale vs checking for BOM) is the same thing. To be clear, I am not saying it is the same t