Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Greg Ewing
Victor Stinner wrote: Users don't use stdin and stdout as regular files, they are more used as pipes to pass data between programs with the Unix pipe in a shell like "producer | consumer". Sometimes stdout is redirected to a file, but I consider that it is expected to behave as a pipe and the reg

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Chris Barker - NOAA Federal
I’m a bit confused: File names and the like are one thing, and the CONTENTS of files is quite another. I get that there is theoretically a “default” encoding for the contents of text files, but that is SO likely to be wrong as to be ignorable. open() already defaults to utf-8. Which is a fine de

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Glenn Linderman
On 12/7/2017 5:45 PM, Jonathan Goble wrote: On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman > wrote: If it were to be changed, one could add a text-mode option in 3.7, say "t" in the mode string, and a PendingDeprecationWarning for open calls without th

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Jonathan Goble
On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman wrote: > If it were to be changed, one could add a text-mode option in 3.7, say "t" > in the mode string, and a PendingDeprecationWarning for open calls without > the specification of either t or b in the mode string. > "t" is already supported in o

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Glenn Linderman
On 12/7/2017 4:48 PM, Victor Stinner wrote: Ok, now comes the real question, open(). For open(), I used the example of a code snippet *writing* the content of a directory (os.listdir) into a text file. Another example is to read filenames from a text files but pass-through undecodable bytes tha

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
2017-12-08 0:26 GMT+01:00 Guido van Rossum : > You will quickly get decoding errors, and that is INADA's point. (Unless you > use encoding='Latin-1'.) His worry is that the surrogateescape error handler > makes it so that you won't get decoding errors, and then the failure mode is > much harder to

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Guido van Rossum
On Thu, Dec 7, 2017 at 3:02 PM, Victor Stinner wrote: > 2017-12-06 5:07 GMT+01:00 INADA Naoki : > > And opening binary file without "b" option is very common mistake of new > > developers. If default error handler is surrogateescape, they lose a > chance > > to notice their bug. > > To come back

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
2017-12-06 5:07 GMT+01:00 INADA Naoki : > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. To come back to your original point, I didn't know that it was a common mistake t

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
While I'm not strongly convinced that open() error handler must be changed for surrogateescape, first I would like to make sure that it's really a very bad idea because changing it :-) 2017-12-07 7:49 GMT+01:00 INADA Naoki : > I just came up with crazy idea; changing default error handler of open

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
> I care only about builtin open()'s behavior. > PEP 538 doesn't change default error handler of open(). > > I think PEP 538 and PEP 540 should behave almost identical except > changing locale > or not. So I need very strong reason if PEP 540 changes default error > handler of open(). > I just ca

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 08:20, Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : >> One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already > implemented this exact check for his PEP 538 which is already >

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 01:59, Jakub Wilk wrote: > * Nick Coghlan , 2017-12-06, 16:15: >> The one that's relevant to default locale detection is just the string >> that "setlocale(LC_CTYPE, NULL)" returns. > > POSIX doesn't require any particular return value for setlocale() calls. > It's only guara

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Thu, 7 Dec 2017 00:22:52 +0100 Victor Stinner wrote: > 2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > > Other than that, +1 on the PEP. > > Naoki doesn't seem to be confortable with the usage of the > surrogateescape error handler by default for open(). Are you ok with > that? If yes, would y

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > Other than that, +1 on the PEP. Naoki doesn't seem to be confortable with the usage of the surrogateescape error handler by default for open(). Are you ok with that? If yes, would you mind to explain why? :-) Victor ___

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 23:20:41 +0100 Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > > One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already > implemented this exact check for his PEP 538 which is

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > One question: how do you plan to test for the POSIX locale? I'm not sure. I will probably rely on Nick for that ;-) Nick already implemented this exact check for his PEP 538 which is already implemented in Python 3.7. I already implemented the PEP 540

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 01:49:41 +0100 Victor Stinner wrote: > Hi, > > I knew that I had to rewrite my PEP 540, but I was too lazy. Since > Guido explicitly requested a shorter PEP, here you have! > > https://www.python.org/dev/peps/pep-0540/ > > Trust me, it's the same PEP, but focused on the most

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Greg Ewing
Victor Stinner wrote: Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with surrogateescape, or backslashreplace for stderr, or surrogatepass for fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But the PEP title would be too long, no? :-) Relaxed UTF-8 Mode? UTF8-Yeah-I'm-F

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Brett Cannon
On Wed, 6 Dec 2017 at 06:10 INADA Naoki wrote: > >> And I have one worrying point. > >> With UTF-8 mode, open()'s default encoding/error handler is > >> UTF-8/surrogateescape. > > > > The Strict UTF-8 Mode is for you if you prioritize correctness over > usability. > > Yes, but as I said, I cares

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Jakub Wilk
* Nick Coghlan , 2017-12-06, 16:15: Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X). For those of us with little to no BSD/MacOS experience, ca

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
>> And I have one worrying point. >> With UTF-8 mode, open()'s default encoding/error handler is >> UTF-8/surrogateescape. > > The Strict UTF-8 Mode is for you if you prioritize correctness over usability. Yes, but as I said, I cares about not experienced developer who doesn't know what UTF-8 mode

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 6 December 2017 at 20:38, Victor Stinner wrote: > Nick: >> So if PEP 540 is going to implicitly trigger switching encodings, it >> needs to specify whether it's going to look for the C locale or the >> POSIX locale (I'd suggest C locale, since that's the actual default >> that causes problems).

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Nick: > So if PEP 540 is going to implicitly trigger switching encodings, it > needs to specify whether it's going to look for the C locale or the > POSIX locale (I'd suggest C locale, since that's the actual default > that causes problems). I'm thinking at the test already used by check_force_asc

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Hi Naoki, 2017-12-06 5:07 GMT+01:00 INADA Naoki : > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. The Strict UTF-8 Mode is for you if you prioritize correctness over usability. In the

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 16:18, Glenn Linderman wrote: > "b" mostly matters on Windows, correct? And Windows doesn't use C or POSIX > locale, correct? And if these are correct, then is this an issue? And if so, > why? In Python 3, "b" matters everywhere, since it controls whether the stream gets wra

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Glenn Linderman
On 12/5/2017 8:07 PM, INADA Naoki wrote: Oh, revised version is really short! And I have one worrying point. With UTF-8 mode, open()'s default encoding/error handler is UTF-8/surrogateescape. Containers are really growing. PyCharm supports Docker and many new Python developers use Docker inste

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 15:59, Chris Angelico wrote: > On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: >> Something I've just noticed that needs to be clarified: on Linux, "C" >> locale and "POSIX" locale are aliases, but this isn't true in general >> (e.g. it's not the case on *BSD systems, in

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Chris Angelico
On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: > Something I've just noticed that needs to be clarified: on Linux, "C" > locale and "POSIX" locale are aliases, but this isn't true in general > (e.g. it's not the case on *BSD systems, including Mac OS X). For those of us with little to no BSD

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X). To handle that in PEP 538, I made it clear that everything is keyed specifically off the "C" loc

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread INADA Naoki
Oh, revised version is really short! And I have one worrying point. With UTF-8 mode, open()'s default encoding/error handler is UTF-8/surrogateescape. Containers are really growing. PyCharm supports Docker and many new Python developers use Docker instead of installing Python directly on their s

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread INADA Naoki
I'm sorry about my laziness. I've very busy these months, but I'm back to OSS world from today. While I should review carefully again, I think I'm close to accept PEP 540. * PEP 540 really helps containers and old Linux machines PEP 538 doesn't work. And containers is really important for these

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 11:01, Victor Stinner wrote: >> Annex: Differences between the PEP 538 and the PEP 540 >> == >> >> The PEP 538 uses the "C.UTF-8" locale which is quite new and only >> supported by a few Linux distributions; this locale is n

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Victor Stinner
> Annex: Differences between the PEP 538 and the PEP 540 > == > > The PEP 538 uses the "C.UTF-8" locale which is quite new and only > supported by a few Linux distributions; this locale is not currently > supported by FreeBSD or macOS for example.

[Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Victor Stinner
Hi, I knew that I had to rewrite my PEP 540, but I was too lazy. Since Guido explicitly requested a shorter PEP, here you have! https://www.python.org/dev/peps/pep-0540/ Trust me, it's the same PEP, but focused on the most important information and with a shorter rationale ;-) Full text below.