Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
k you'll have to write an alternative PEP if you want to see something like this implemented throughout Python. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
would work no matter whether that expectation agrees with reality or not. The amount of moji-bake that you get is larger when the disagreement is larger, but it will continue to *work*. Regards, Martin ___ Python-Dev mailing list Python-Dev@p

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
se, can you please describe a specific scenario? What application, what file names, what encodings, what problems? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
which I agree with, but which clearly the Unix people > here are not as certain of. Ok, I have added another paragraph. Not sure whether it helps to clarify though. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.pytho

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
e an example for a setup where it is not reversible. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
e to something, and then create files in their homedirectory, it will work just fine, and nobody else will ever see the files (except for the backup software). When they find that the files they created are inaccessible to others, they will often stop

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
ged the PEP to avoid using PUA characters at all. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
s) the problem is more severe, and people are more likely to create files with Cyrillic, or Japanese, names (say) if the systems accepts them at all. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
es > (encoding would produce the original bytes). > > I'd prefer option 2. I hadn't thought of this case, but you are right - they *are* illegal bytes, after all. Raising an exception would be useless since the whole point of this codec is to never raise unicode errors. Regards, M

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
renamed because the target name already exists. In all these cases, the application has to ask the user to reconsider; for at least the last case, it should be prepared to do that, anyway (there is also the case where renaming fails because of lack of permissions; in that case, picking a different

Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Martin v. Löwis
> Does anyone know what the problem is? The hardware running it apparently has serious problems. Upfronthosting, the company providing the hardware, is working on a solution. Unfortunately, it is difficult to get support from the datacenter on weekends. Regards, Mar

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Martin v. Löwis
> How about another str-like type, a sequence of char-or-bytes? That would be a different PEP. I personally like my own proposal more, but feel free to propose something different. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org h

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
ter within the private use area, or a particular > range, or what? It's a range. The lower-case 'x' denotes a variable half-byte, ranging from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code points. Regards, Martin ___

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
u pass? > And for a Unix filesystem mounted on a Windows host? Or accessed via > some network connection? Same issue really: what specific mounting software did you use? Windows cannot mount Unix file systems on its own, or through some network connection. Regards, Martin

Re: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1

2009-04-27 Thread Martin v. Löwis
therwise, if you can contribute a useful bug report (or even a patch), please go ahead. I would try to turn logging on through the registry and see whether that gives any insight. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mai

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
filename is not valid in my locale's encoding). The > Nedit editor also worked. So far I haven't found anything that failed. So what SMB server did you mount here, using what software, and what mount options? I think you might be referring to an entirely different use case. Regard

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
Glenn Linderman wrote: > On approximately 4/27/2009 12:42 PM, came the following characters from > the keyboard of Martin v. Löwis: >>>> It's a private use area. It will never carry an official character >>>> assignment. >>> >>> I know that U+F

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
other solution. It was in > that sense, of thinking about possibly existing practice, and leveraging > an existing solution, that caused me to bring up the topic. I think you make quite a lot of assumptions here. It would be better to research the state of the art first, and onl

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
hon automatically encodes strings with the file system encoding before passing them to the POSIX API. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
/. I'm happy to exclude that range from the mapping if POSIX really requires an encoding not to be overlapping with ASCII. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Uns

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
n, at the moment. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383 (again)

2009-04-27 Thread Martin v. Löwis
they don't support command line arguments, or environment variables. If you want to complete them, you should write a PEP. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
he same thing will happen: the name of the file will be the very same byte sequence as the one passed on the command line. Most Unix users here agree that this is the right thing to happen. Regards, Martin ___ Python-Dev mailing list Python-Dev@pyt

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
;. > > Since this byte sequence doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). > > Mart

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
upporting a LRU list, I would remove/hide all entries that don't correlate to existing files - after all, the user may have as well deleted the file in the LRU list. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
> If the PEP depends on this being changed, it should be mentioned in the > PEP. The PEP says that the utf-8b codec decodes invalid bytes into low surrogates. I have now clarified that a strict definition of UTF-8 is assumed for utf-8b. Regards,

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
> Furthermore, I don't believe that PEP 383 works consistently on Windows, What makes you say that? PEP 383 will have no effect on Windows, compared to the status quo, whatsoever. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
most libraries is > undefined for the kinds of unicode strings you construct, and it may be > undefined in a bad way (crash, buffer overflow, whatever). Indeed so. This is intentional. If you can crash Python that way, nothing gets worse by this PEP - you can then *already* crash Python i

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
MRAB wrote: > Martin v. Löwis wrote: >>> Furthermore, I don't believe that PEP 383 works consistently on Windows, >> >> What makes you say that? PEP 383 will have no effect on Windows, >> compared to the status quo, whatsoever. >> > You could argue that

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
sk are also different in memory. No ambiguity. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
, it is. Rest assured that the utf-8b codec will work the way it is specified. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/option

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
Glenn Linderman wrote: > On approximately 4/28/2009 1:25 PM, came the following characters from > the keyboard of Martin v. Löwis: >>> The UTF-8b representation suffers from the same potential ambiguities as >>> the PUA characters... >> >> Not at all the sa

Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Martin v. Löwis
I personally don't see a problem here - *of course* os.listdir will report invalid utf-16 encodings, if that's what is stored on disk. It doesn't matter whether the file names are valid wrt. some specification. What matters is that you can access all the files. Regards, Martin

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
and the string interface to access the same file: this would be a ridiculous interpretation. *Of course* you can access /etc/passwd both as "/etc/passwd" and b"/etc/passwd", there is nothing ambiguous about that. Regards, Martin

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
ith completely non-sensical bytes. In practice, it probably won't be that bad - python-escape has likely escaped all non-ASCII bytes, so that on re-encoding with a different encoding, only the ASCII characters get encoded, which likely will work fine. Regards, Martin ___

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Martin v. Löwis
T] a b c > > which would produce "ABC" in unicode, which is ambiguous with: > > A B C > > which would also produce "ABC"? No: the "shift" in "shift-jis" is not really about the shift key. See http://en.wikipedia.org/wiki/Shift-JIS Regar

Re: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in System Character?Interfaces

2009-04-28 Thread Martin v. Löwis
> I would like utility functions to perform: > os-bytes->funny-encoded > funny-encoded->os-bytes > or explicit example code snippets for same in the PEP text. Done! Martin ___ Python-Dev mailing list Python-Dev@python.org http

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> OK, so you are saying that under PEP 383, utf-8b wouldn't be used > anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three paragraphs of the "Specification" section ma

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
both get used. >> >> Your formulation is a bit too stenographic to me, but please trust me >> that there is *no* ambiguity in the case you construct. > > > No Martin, the point of reviewing the PEP is to _not_ trust you, even > though you are generally very knowledgeab

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
ready produces errors, that > UTF-8 is a special case, not the only case. I have fixed that by extending the third paragraph. > The code added to the discussion has mismatched (), making me wonder if > it is complete. There is a reasonable possibility that only the final ) > is missing

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
es on being able to code half surrogates as UTF-8? Can you please elaborate? What code specifically are you talking about? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: ht

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
are a little clearer in the documentation for > sys.setfilesystemencoding, which does say the encoding isn't used by > Windows -- so why is it permitted to change it, if it has no effect?). As in many cases: because nobody contributed code to make it behave otherwise. It's not that

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
n the LRU list, and lookup the file by inode number - or object UUID on NTFS, possibly using distributed link tracking). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http

Re: [Python-Dev] string to float containing whitespace

2009-04-29 Thread Martin v. Löwis
) and string.atoi() allow whitespace. Maybe I'm > thinking of trailing non-numeric, non-whitespace characters. Maybe you remember truly *embedded* whitespace: py> float("1. 3") Traceback (most recent call last): File "", line 1, in ValueError: in

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
ames to bytes. I have clarified the PEP to make that explicit. IOW, it replaces the current "strict" setting in these cases. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubsc

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> But I shouldn't have to guess. The PEP should explain how these things > are useful. The discussion section could be extended with use cases for > both the encode and decode cases. See PEP 293. Regards, Martin ___ Python-Dev maili

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
d that filenames come in two forms > unicode and bytes if its not utf-8 data. Why not simply return string if > its valid utf-8 otherwise return bytes? That would have been an alternative solution, and the one that 2.x uses for listdir. People di

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
is what happens when you try to work > with them. Please, let's see some code we can run, not more words. Just try my example above, on a Linux system, in a UTF-8 locale. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
t on the other VMs, those would have to either implement it natively, or provide byte-oriented APIs to allow Jython/IronPython to implement it on top of it (the latter being not realistic or useful). Regards, Martin ___ Python-Dev mailing list Python-Dev@pyth

Re: [Python-Dev] PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
ferent things, they have to tell me what specifically they want it to say (perhaps even with specific formulations). If they can't communicate their requests to me, I can't comply. Regards, Martin ___ Python-Dev mailing list Python-Dev

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
discussions could be reduced if readers would try to constructively comment on the PEP, rather than making counter-proposals, or making statements about the PEP without making their implied assumptions explicit. Regards, Martin ___ Python-Dev ma

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
Jeroen Ruigrok van der Werven wrote: > -On [20090430 07:18], "Martin v. Löwis" (mar...@v.loewis.de) wrote: >> Suppose I create a new directory, and run the following script >> in 3.x: >> >> py> open("x","w").close() >> py> o

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-29 Thread Martin v. Löwis
> incompatible with CPython running on UNIX, and there's no way to fix that. *Not* adapting the PEP will also make CPython and IronPython incompatible, and there's no way to fix that. Regards, Martin ___ Python-Dev mailing list Python-De

Re: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
code support (via UTF-8) before Windows. If so, PEP 383 won't hurt. If you never get decode errors for file names, you can just ignore PEP 383. It's only for those of us who do get decode errors. Regards, Martin ___ Python-Dev mailing list Pyt

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
l files on disk. Neither Java nor Mono are capable of doing that. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 383 and GUI libraries

2009-04-30 Thread Martin v. Löwis
) behaves the same way. PyQt displays a single square box. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Martin v. Löwis
> Assuming people agree that this is an accurate summary, it should be > incorporated into the PEP. Done! Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Martin v. Löwis
> I think it has to be excluded from mapping in order to not introduce > security issues. I think you are right. I have now excluded ASCII bytes from being mapped, effectively not supporting any encodings that are not ASCII compatible. Does that sound ok? Regards,

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
> OK, so what's wrong with os.listdir() and similar functions returning a > unicode string for strings that correctly encode/decode, and with byte > strings for strings that are not valid unicode? See http://bugs.python.org/issue3187 in particular msg71655 R

Re: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
't be rushed through, and > one should look more carefully first at what the Windows kernel does in > these situations, and what Mono and Java do. These questions really have been studied on this list for the last eight years, over and over again. It's not being rushed. Regards, Marti

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
with the string-oriented ones. In the rationale, the PEP explains why I consider this the worse choice. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mai

Re: [Python-Dev] PEP 383 and GUI libraries

2009-04-30 Thread Martin v. Löwis
UTF-8. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 382 update

2009-04-30 Thread Martin v. Löwis
PEP 382. I have now changed the PEP to call the files .pth, more in line with how top-level .pth files work, and added a statement that the import feature of .pth files is not provided for package .pth files (use __init__.py instead). Regards, Martin ___ Pyt

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
30553): WARNING **: FindNextFile: Bad encoding for '/home/martin/work/3k/t/\xff' Consider using MONO_EXTERNAL_ENCODINGS when running the program using System.IO; class X{ public static void Main(string[] args){ DirectoryInfo di = new DirectoryInfo("."); foreach(

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
it to be. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
re analogous to the one I got when using System.IO.DirectoryInfo ever exist in Python? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
might have to bind other arguments with utf-8 conversion. I couldn't find a Python wrapper for libtiff. If a wrapper was written, it would indeed have to use the file system encoding for the file name parameters. However, it would have to do that even without PEP 383, since the file name should

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
> Martin, if you're going to stick with the half-surrogate trick, would > you mind adding a section to the PEP on "alternate encoding strategies", > explaining why the NULL method was not selected? In the PEP process, it isn't my job to criticize competing prop

Re: [Python-Dev] a suggestion ... Re: PEP 383 (again)

2009-04-30 Thread Martin v. Löwis
), listdir() et. al. will continue to accept bytes > for filenames? In Python 3, the file chooser should definitely return strings, and it would be good if they were PEP 383 compliant. >> So I prefer the half surrogate because its failure mode is better th > > Heh heh heh. And i

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Martin v. Löwis
that strings decoded > from the filesystem are reversible, but not check what might be de novo > strings? Exactly so. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: h

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Martin v. Löwis
> myself? No, you should encode using the "strict" error handler, with the locale encoding. If the file name encodes successfully, it's correct, otherwise, it's broken. Regards, Martin ___ Python-Dev mailing list Python-Dev@python

Re: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-04-30 Thread Martin v. Löwis
> I've taken the liberty of explicitly CCing Martin just incase he missed > the thread with all the noise regarding PEP383. > > If there are no objections from Martin It's fine with me - I just won't have time to look into the details of th

[Python-Dev] Deferring PEP 382

2009-05-01 Thread Martin v. Löwis
. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 383 and GUI libraries

2009-05-01 Thread Martin v. Löwis
ng, 'python-escape').decode('utf-8', > 'python-escape') will always produce srcbytes ? I think you mixed up bytes and unicode here: if srcbytes is indeed a bytes object, then you can't apply .encode to it. Regards, Martin ___

Re: [Python-Dev] PEP 382: little help for stupid people?

2009-05-01 Thread Martin v. Löwis
declarative - no need to write code. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 382: little help for stupid people?

2009-05-01 Thread Martin v. Löwis
nit__.py 2. addon1.tar, containing simplistix/addon1.pth (containing a single "*") simplistix/feature1.py 3. addon2.tar, containing simplistix/addon2.pth simplistix/feature2.py Unpack each of them anywhere on sys.path, in any order. Regards, Martin _

Re: [Python-Dev] PEP 382: Namespace Packages

2009-05-01 Thread Martin v. Löwis
one know how to do the latter, not because there is > no desire to do so! > > I, for one, have been trying to figure out how to do "base namespace" > packages for years... You mean, without PEP 382? That won't be possible, unless you can coordinate all addon packages. Ba

Re: [Python-Dev] PEP 382: Namespace Packages

2009-05-01 Thread Martin v. Löwis
install into the same directory, you can have base packages already. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ar

[Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
the name "python-escape" is not very descriptive, so I've changed the name to "utf8b". I've updated the PEP accordingly. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/l

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
> That's even nicer. One minor detail though, in the sentence: > > "non-decodable bytes >128 will be represented as lone half surrogate" > > ">" should be ">=". Thanks, fixed. Martin ___

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
t code points. > Also, if utf8-b is not provided as a codec, will there be an easy way for user > code to use the same encoding as the IO layer does? s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in fact, that's ex

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
> it's an algorithm based on 16-bit or 32-bit code points. > > > To me that lack of relationship with utf8 suggests that it should not be > called utf8b Perhaps. However, giving it that name was Markus Kuhn's choice - and while it may be confusing, it's

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
code points, while the > output is 16- or 32-bit code points. Right - the algorithm maps between bytes and 16/32-bit code units. It works, in particular, for UTF-8, and was originally proposed to apply to UTF-8 - but it can work in any other place that c

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
In > some error handlers, such as the 'utf8b' proposed here, it is also > simpler and more efficient for the error handler to provide a > pre-encoded replacement byte string, rather than forcing it to > calculating Unicode from which the encoder would cr

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-05 Thread Martin v. Löwis
the vacuum". In any case, the discussion says # Encodings that are not compatible with ASCII are not supported by # this specification; bytes in the ASCII range that fail to decode # will cause an exception. It is widely agreed that such encodings # should not be used as locale charsets.

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
learly separate. They *are* separate naemspaces; that's guaranteed by the implementation. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Stephen J. Turnbull wrote: > "Martin v. Löwis" writes: > > > It occurs to me that the PEP maybe should say that it is an error > > > to have your POSIX locale set to UTF-16 or something like that. > > > > No. It is *impossible* to have UTF-

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
s allow them to do what they want). Libraries can never enforce that applications conform to some standard. > Sorry! I suggest substituting the paragraph above for the paragraph > which begins "The encode error handler interface presentlyrequires..." > at

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
hink of ways to enhance the experience. In any case, Python 3.1b1 may get released today, so it's way too late for new features in the PEP. They can wait for Python 3.2. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
>>> The name "utf8b" suggested in the PEP is not in line with the codec >>> design >> Where is that design documented, and how exactly violates the name >> the design (chapter and verse, please). > > Martin, I designed the whole Python codec machine

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
#x27;t handle this Not true. PEP 383 handles this very example just fine, with no problems that I can see. Can you propose a specific example that you think might cause problems? By "specific", I mean: what file names (exact bytes, please), what locale charset, what API calls. Regards,

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
> Judging by the existing names, I think that 'surrogate' would be > reasonable MAL's list of existing names is incomplete. "surrogates" is already an existing name, also, and it means something different (similar, b

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Terry Reedy wrote: > Glenn Linderman wrote: >> On approximately 5/6/2009 3:08 AM, came the following characters from >> the keyboard of MRAB: >>> M.-A. Lemburg wrote: >>>> Martin v. Löwis wrote: >> >>> Judging by the existing names, I think t

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
> Is it only usable with utf8 as an encoding? No, it applies to any codec which potentially cannot decode all bytes >127. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsub

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
but under the PEP as it is > a large fraction of Shift JIS and Big5 filenames cannot be read under > ASCII-compatible file system encodings using 'utf8b'. Yet it is those > users who are placed at risk by PEP 383. I think this statement is i

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Antoine Pitrou wrote: > Martin v. Löwis v.loewis.de> writes: >> Despite there being also an error handler called "surrogates". > > People, perhaps we could end all the bikeshedding and call one of those > handlers > "surrogates-pass" and the o

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
e: I call it utf8b because that's the established name for the algorithm it implements. That algorithm was originally designed with UTF-8 in mind (and only meant to be applied for UTF-8), however, it remains the same algorithm even though PEP 383 widens its applic

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
st recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed py> "\ud800".encode("utf-8","surrogates") b'\xed\xa0\x80' py> b

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Michael Urman wrote: > On Wed, May 6, 2009 at 15:42, "Martin v. Löwis" wrote: >> Despite there being also an error handler called "surrogates". > > Not that I have to be, but I'm not sold on the previous UTF-8 codec > behavior becoming an error handler

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
e("shift-jis","utf8b") '\udc81/' so the utf8b error handler will escape the first of the two bytes, and then pass the second byte to the codec again, which then decodes as ASCII. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

<    4   5   6   7   8   9   10   11   12   13   >