> Perhaps not a full description of the status quo, but the PEP definitely
> needs a good summary
I completely agree, and believe that the PEP *does* have a good
summary - it has both an abstract, and a rationale, and both say
exactly what I want them to say. If people want them to say different
t
> But I shouldn't have to guess. The PEP should explain how these things
> are useful. The discussion section could be extended with use cases for
> both the encode and decode cases.
See PEP 293.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev
On Wed, Apr 29, 2009, "Martin v. L?wis" wrote:
>
> I'm at a loss how to make the text more clear than it already is. I'm
> really not good at writing long essays, with a lot of
> explanatory-but-non-normative text. I also think that explanations do
> not belong in the section titled specification,
On approximately 4/29/2009 1:06 PM, came the following characters from
the keyboard of Martin v. Löwis:
> Thanks, fixed.
Thanks for your fixes. They are helpful.
I'm at a loss how to make the text more clear than it already is. I'm
really not good at writing long essays, with a lot of
expl
> In the first paragraph, you should make it clear that Python 3.0 does
> not use the Windows bytes interfaces, if it doesn't. "Python uses
> *only* the wide character APIs..." would suffice.
That's not quite exact. It uses both ANSI and Wide APIs - depending
on whether you pass bytes as input or
Thomas Breuel gmail.com> writes:
>
> The error checking isn't necessarily deficient. For example, a safe and
legitimate thing to do is for third party libraries to throw a C++ exception,
raise a Python exception, or delete the half surrogate.
Do you have any concrete examples of this behaviour?
On approximately 4/29/2009 12:17 AM, came the following characters from
the keyboard of Martin v. Löwis:
OK, so you are saying that under PEP 383, utf-8b wouldn't be used
anywhere on Windows by default. That's not clear from your proposal.
You didn't read it carefully enough. The first three p
> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
> anywhere on Windows by default. That's not clear from your proposal.
You didn't read it carefully enough. The first three paragraphs of
the "Specification" section make that clear.
Regards,
Martin
_
On Wed, Apr 29, 2009 at 07:45, "Martin v. Löwis" wrote:
> Your claim was
> that PEP 383 may have unfortunate effects on Windows,
No, I simply think that PEP 383 is not sufficiently specified to be able to
tell.
> and I'm telling
> you that it won't, because the behavior of Python on Windows w
> The wide APIs use UTF-16. UTF-16 suffers from the same problem as
> UTF-8: not all sequences of words are valid UTF-16 sequences. In
> particular, sequences containing isolated surrogate pairs are not
> well-formed according to the Unicode standard. Therefore, the existence
> of a wide charact
>
> It cannot crash Python; it can only crash
> hypothetical third-party programs or libraries with deficient error
> checking and
> unreasonable assumptions about input data.
The error checking isn't necessarily deficient. For example, a safe and
legitimate thing to do is for third party librar
Thomas Breuel gmail.com> writes:
>
> And, in fact, Windows Vista happily creates files with malformed UTF-16
encodings, and os.listdir() happily returns them.
The PEP won't change that, so what's the problem exactly?
> Under your proposal, passing the output from a correctly implemented file
s
>
> On Windows, the Wide APIs are already used throughout the code base,
> e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
> specific API for a specific functionality, please read the source code.
> [...]
>
No, I don't assume that. I assume that all functions are strictly
> ava
MRAB wrote:
> Martin v. Löwis wrote:
>>> Furthermore, I don't believe that PEP 383 works consistently on Windows,
>>
>> What makes you say that? PEP 383 will have no effect on Windows,
>> compared to the status quo, whatsoever.
>>
> You could argue that if Windows is actually returning UTF-16 with
> Your proposal says that utf-8b would be used for file systems, but then
> you also say that it might be used for command line arguments and
> environment variables. So, which specific APIs will it be used with on
> Windows and on POSIX systems?
On Windows, the Wide APIs are already used through
On Tue, Apr 28, 2009 at 20:45, "Martin v. Löwis" wrote:
> > Furthermore, I don't believe that PEP 383 works consistently on Windows,
>
> What makes you say that? PEP 383 will have no effect on Windows,
> compared to the status quo, whatsoever.
>
That's what you believe, but it's not clear to me
Martin v. Löwis wrote:
Furthermore, I don't believe that PEP 383 works consistently on Windows,
What makes you say that? PEP 383 will have no effect on Windows,
compared to the status quo, whatsoever.
You could argue that if Windows is actually returning UTF-16 with half
surrogates that they
> Furthermore, I don't believe that PEP 383 works consistently on Windows,
What makes you say that? PEP 383 will have no effect on Windows,
compared to the status quo, whatsoever.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://
>
> However, it is "mission creep": Martin didn't volunteer to
> write a PEP for it, he volunteered to write a PEP to solve the
> "roundtrip the value of os.listdir()" problem. And he succeeded, up
> to some minor details.
Yes, it solves that problem. But that doesn't come without cost.
Most i
> If we follow your approach, that ISO8859-15 string will get turned into
> an escaped unicode string inside Python. If I understand your proposal
> correctly, if it's a output file name and gets passed to Python's open
> function, Python will then decode that string and end up with an
> ISO8859-1
Thomas Breuel writes:
> PEP 383 doesn't make it any easier; it just turns one set of
> problems into another.
That's false. There is an interesting class of problems of the form
"get a list of names from the OS and allow the user to select from it,
and retrieve corresponding content." People
Hrvoje Niksic wrote:
> Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8
> sequence, will be converted to the half-surrogate '\udcff'. However,
> a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be
> converted to '\udcff'. Those are quite different POSIX p
On Tue, 28 Apr 2009 at 09:30, Thomas Breuel wrote:
Therefore, when Python encounters path names on a file system
that are not consistent with the (assumed) encoding for that file
system, Python should raise an error.
This is what happens currently, and users are quite unhappy about it.
We nee
Lino Mastrodomenico wrote:
Let's suppose that I use Python 2.x or something else to create a file
with name b'\xff'. My (Linux) system has a sane configuration and the
filesystem encoding is UTF-8, so it's an invalid name but the kernel
will blindly accept it anyway.
With this PEP, Python 3.1 li
2009/4/28 Thomas Breuel :
> If we follow PEP 383, you will get lots of errors anyway because those
> strings, when encoded in utf-8b, will result in an error when you try to
> write them on a Windows file system or any other system that doesn't allow
> the byte sequences that the utf-8b encodes.
I
On Tue, Apr 28, 2009 at 11:32:26AM +0200, Thomas Breuel wrote:
> On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote:
> > I have an FTP server to which clients with different local encodings
> > are connecting. FTP protocol doesn't have a notion of encoding so filenames
> > on the filesystem are
On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote:
> On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote:
> > Returning an error for an incorrect encoding doesn't make
> > internationalization harder, it makes it easier because it makes
> debugging
> > easier.
>
>What is a "correc
On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote:
> Returning an error for an incorrect encoding doesn't make
> internationalization harder, it makes it easier because it makes debugging
> easier.
What is a "correct encoding"?
I have an FTP server to which clients with differen
>
>
>Until it's hard there will be no internationalization. A fact of life,
> damn it. Programmers are lazy, and have many problems to solve.
PEP 383 doesn't make it any easier; it just turns one set of problems into
another. Actually, it makes it worse, since any problems that show up now
s
On Tue, Apr 28, 2009 at 09:30:01AM +0200, Thomas Breuel wrote:
> Programmers may find it inconvenient that they have to spend time figuring
> out and deal with platform-dependent file system encoding issues and
> errors. But internationalization and unicode are hard, that's just a fact
> of life.
> > Therefore, when Python encounters path names on a file system
> > that are not consistent with the (assumed) encoding for that file
> > system, Python should raise an error.
>
> This is what happens currently, and users are quite unhappy about it.
We need to keep "users" and "programmers" dis
> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode
> strings in a reversible way.
That isn't really true; it is not, inherently, about UTF-8.
Instead, it tries to represent non-filesystem-encoding byte sequence
in Unicode strings in a reversible way.
> Quietly escaping a bad UTF-
32 matches
Mail list logo