Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> Perhaps not a full description of the status quo, but the PEP definitely > needs a good summary I completely agree, and believe that the PEP *does* have a good summary - it has both an abstract, and a rationale, and both say exactly what I want them to say. If people want them to say different t

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> But I shouldn't have to guess. The PEP should explain how these things > are useful. The discussion section could be extended with use cases for > both the encode and decode cases. See PEP 293. Regards, Martin ___ Python-Dev mailing list Python-Dev

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Aahz
On Wed, Apr 29, 2009, "Martin v. L?wis" wrote: > > I'm at a loss how to make the text more clear than it already is. I'm > really not good at writing long essays, with a lot of > explanatory-but-non-normative text. I also think that explanations do > not belong in the section titled specification,

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 1:06 PM, came the following characters from the keyboard of Martin v. Löwis: > Thanks, fixed. Thanks for your fixes. They are helpful. I'm at a loss how to make the text more clear than it already is. I'm really not good at writing long essays, with a lot of expl

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> In the first paragraph, you should make it clear that Python 3.0 does > not use the Windows bytes interfaces, if it doesn't. "Python uses > *only* the wide character APIs..." would suffice. That's not quite exact. It uses both ANSI and Wide APIs - depending on whether you pass bytes as input or

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Antoine Pitrou
Thomas Breuel gmail.com> writes: > > The error checking isn't necessarily deficient.  For example, a safe and legitimate thing to do is for third party libraries to throw a C++ exception, raise a Python exception, or delete the half surrogate. Do you have any concrete examples of this behaviour?

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Glenn Linderman
On approximately 4/29/2009 12:17 AM, came the following characters from the keyboard of Martin v. Löwis: OK, so you are saying that under PEP 383, utf-8b wouldn't be used anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three p

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> OK, so you are saying that under PEP 383, utf-8b wouldn't be used > anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three paragraphs of the "Specification" section make that clear. Regards, Martin _

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Wed, Apr 29, 2009 at 07:45, "Martin v. Löwis" wrote: > Your claim was > that PEP 383 may have unfortunate effects on Windows, No, I simply think that PEP 383 is not sufficiently specified to be able to tell. > and I'm telling > you that it won't, because the behavior of Python on Windows w

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
> The wide APIs use UTF-16. UTF-16 suffers from the same problem as > UTF-8: not all sequences of words are valid UTF-16 sequences. In > particular, sequences containing isolated surrogate pairs are not > well-formed according to the Unicode standard. Therefore, the existence > of a wide charact

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > It cannot crash Python; it can only crash > hypothetical third-party programs or libraries with deficient error > checking and > unreasonable assumptions about input data. The error checking isn't necessarily deficient. For example, a safe and legitimate thing to do is for third party librar

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Antoine Pitrou
Thomas Breuel gmail.com> writes: > > And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them. The PEP won't change that, so what's the problem exactly? > Under your proposal, passing the output from a correctly implemented file s

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > On Windows, the Wide APIs are already used throughout the code base, > e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the > specific API for a specific functionality, please read the source code. > [...] > No, I don't assume that. I assume that all functions are strictly > ava

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
MRAB wrote: > Martin v. Löwis wrote: >>> Furthermore, I don't believe that PEP 383 works consistently on Windows, >> >> What makes you say that? PEP 383 will have no effect on Windows, >> compared to the status quo, whatsoever. >> > You could argue that if Windows is actually returning UTF-16 with

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
> Your proposal says that utf-8b would be used for file systems, but then > you also say that it might be used for command line arguments and > environment variables. So, which specific APIs will it be used with on > Windows and on POSIX systems? On Windows, the Wide APIs are already used through

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Tue, Apr 28, 2009 at 20:45, "Martin v. Löwis" wrote: > > Furthermore, I don't believe that PEP 383 works consistently on Windows, > > What makes you say that? PEP 383 will have no effect on Windows, > compared to the status quo, whatsoever. > That's what you believe, but it's not clear to me

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread MRAB
Martin v. Löwis wrote: Furthermore, I don't believe that PEP 383 works consistently on Windows, What makes you say that? PEP 383 will have no effect on Windows, compared to the status quo, whatsoever. You could argue that if Windows is actually returning UTF-16 with half surrogates that they

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
> Furthermore, I don't believe that PEP 383 works consistently on Windows, What makes you say that? PEP 383 will have no effect on Windows, compared to the status quo, whatsoever. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > However, it is "mission creep": Martin didn't volunteer to > write a PEP for it, he volunteered to write a PEP to solve the > "roundtrip the value of os.listdir()" problem. And he succeeded, up > to some minor details. Yes, it solves that problem. But that doesn't come without cost. Most i

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
> If we follow your approach, that ISO8859-15 string will get turned into > an escaped unicode string inside Python. If I understand your proposal > correctly, if it's a output file name and gets passed to Python's open > function, Python will then decode that string and end up with an > ISO8859-1

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Stephen J. Turnbull
Thomas Breuel writes: > PEP 383 doesn't make it any easier; it just turns one set of > problems into another. That's false. There is an interesting class of problems of the form "get a list of names from the OS and allow the user to select from it, and retrieve corresponding content." People

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Duncan Booth
Hrvoje Niksic wrote: > Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 > sequence, will be converted to the half-surrogate '\udcff'. However, > a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be > converted to '\udcff'. Those are quite different POSIX p

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread R. David Murray
On Tue, 28 Apr 2009 at 09:30, Thomas Breuel wrote: Therefore, when Python encounters path names on a file system that are not consistent with the (assumed) encoding for that file system, Python should raise an error. This is what happens currently, and users are quite unhappy about it. We nee

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Hrvoje Niksic
Lino Mastrodomenico wrote: Let's suppose that I use Python 2.x or something else to create a file with name b'\xff'. My (Linux) system has a sane configuration and the filesystem encoding is UTF-8, so it's an invalid name but the kernel will blindly accept it anyway. With this PEP, Python 3.1 li

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Lino Mastrodomenico
2009/4/28 Thomas Breuel : > If we follow PEP 383, you will get lots of errors anyway because those > strings, when encoded in utf-8b, will result in an error when you try to > write them on a Windows file system or any other system that doesn't allow > the byte sequences that the utf-8b encodes. I

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Oleg Broytmann
On Tue, Apr 28, 2009 at 11:32:26AM +0200, Thomas Breuel wrote: > On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote: > > I have an FTP server to which clients with different local encodings > > are connecting. FTP protocol doesn't have a notion of encoding so filenames > > on the filesystem are

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote: > On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote: > > Returning an error for an incorrect encoding doesn't make > > internationalization harder, it makes it easier because it makes > debugging > > easier. > >What is a "correc

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Oleg Broytmann
On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote: > Returning an error for an incorrect encoding doesn't make > internationalization harder, it makes it easier because it makes debugging > easier. What is a "correct encoding"? I have an FTP server to which clients with differen

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > >Until it's hard there will be no internationalization. A fact of life, > damn it. Programmers are lazy, and have many problems to solve. PEP 383 doesn't make it any easier; it just turns one set of problems into another. Actually, it makes it worse, since any problems that show up now s

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Oleg Broytmann
On Tue, Apr 28, 2009 at 09:30:01AM +0200, Thomas Breuel wrote: > Programmers may find it inconvenient that they have to spend time figuring > out and deal with platform-dependent file system encoding issues and > errors. But internationalization and unicode are hard, that's just a fact > of life.

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Thomas Breuel
> > Therefore, when Python encounters path names on a file system > > that are not consistent with the (assumed) encoding for that file > > system, Python should raise an error. > > This is what happens currently, and users are quite unhappy about it. We need to keep "users" and "programmers" dis

Re: [Python-Dev] PEP 383 (again)

2009-04-27 Thread Martin v. Löwis
> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode > strings in a reversible way. That isn't really true; it is not, inherently, about UTF-8. Instead, it tries to represent non-filesystem-encoding byte sequence in Unicode strings in a reversible way. > Quietly escaping a bad UTF-