Hi,
Le samedi 09 janvier 2010 13:45:58, vous avez écrit :
> > Note: I implemented the BOM check in TextIOWrapper; so it's already
> > usable for any file-like object.
>
> Yes, but the implementation is limited to just BOM checking
> and thus only supports UTF-8-SIG, UTF-16 and UTF-32.
Sure, but
Le samedi 09 janvier 2010 02:12:28, MRAB a écrit :
> What about listing the possible encodings? It would try each in turn
> until it found one where the BOM matched or had no BOM:
>
> my_file = open(filename, 'r', encoding='UTF-8-sig|UTF-16|UTF-8')
>
> or is that taking it too far?
Yes, you'
Le samedi 09 janvier 2010 01:47:38, vous avez écrit :
> One concern I have with this implementation encoding="BOM" is that if
> there is no BOM it assumes UTF-8.
If no BOM is found, it fallback to the current heuristic: os.device_encoding()
or system local.
> (...) Hence, it might be that someon
Victor Stinner wrote:
> (2) Check for a BOM while reading or detect it before?
>
> Everybody agree that checking BOM is an interesting option and should not be
> limited to open().
>
> Marc-Andre proposed a codecs.guess_file_encoding() function accepting a file
> name or a binary file-like obje
Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit :
> While I would support combining BOM detection in the case where a file
> is opened for reading and no encoding is specified, I see two problems:
> a) if a seek operations is performed before having looked at the BOM,
>no determinat
On 09.01.10 01:47, Glenn Linderman wrote:
> On approximately 1/8/2010 3:59 PM, came the following characters from
> the keyboard of Victor Stinner:
>> Hi,
>>
>> Thanks for all the answers! I will try to sum up all ideas here.
>
> One concern I have with this implementation encoding="BOM" is that
It seems to me that when opening a file, the following is the only
flow that makes sense for the typical opening of a file flow:
if encoding is not None:
use encoding
elif file has BOM:
use BOM
else:
use system default
And hence a encoding='BOM' isn't needed there. Although I'm trying to
On approximately 1/8/2010 5:12 PM, came the following characters from
the keyboard of MRAB:
Glenn Linderman wrote:
On approximately 1/8/2010 3:59 PM, came the following characters from
the keyboard of Victor Stinner:
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
One co
>>> Antoine would like to check BOM by default, because both options
>>> (system locale vs checking for BOM) is the same thing.
>>>
>> To be clear, I am not saying it is the same thing. What I think is
>> that it would be a mistake to use a mildly unreliable heuristic by
>> default (the locale +
Glenn Linderman wrote:
On approximately 1/8/2010 3:59 PM, came the following characters from
the keyboard of Victor Stinner:
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
One concern I have with this implementation encoding="BOM" is that if
there is no BOM it assumes U
On approximately 1/8/2010 3:59 PM, came the following characters from
the keyboard of Victor Stinner:
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
One concern I have with this implementation encoding="BOM" is that if
there is no BOM it assumes UTF-8. That is probably
On 09/01/2010 00:09, Antoine Pitrou wrote:
Hello Victor,
Victor Stinner haypocalc.com> writes:
(1) Change default open() behaviour or make it optional?
[...]
Antoine would like to check BOM by default, because both options (system
locale vs checking for BOM) is the same thing
Hello Victor,
Victor Stinner haypocalc.com> writes:
>
> (1) Change default open() behaviour or make it optional?
>
[...]
>
> Antoine would like to check BOM by default, because both options (system
> locale vs checking for BOM) is the same thing.
To be clear, I am not saying it is the same t
Hi,
Thanks for all the answers! I will try to sum up all ideas here.
(1) Change default open() behaviour or make it optional?
Guido would like to add an option and keep open() unchanged. He wrote that
checking for BOM and using system locale are too much different to be the same
option (encod
14 matches
Mail list logo