Re: [Python-Dev] yield from?
Benjamin Peterson wrote: What's the status of yield from? There's still a small window open for a patch to be checked into 3.1's branch. I haven't been following the python-ideas threads, so I'm not sure if it's ready yet. The PEP itself seems to have settle down, and is awaiting a verdict from Guido. The prototype implementation doesn't quite match the PEP in some of the fine details yet. Also it's for 2.6 rather than 3.x; someone with more knowledge of 3.x internals would be better placed than me to convert it. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 383 update: utf8b is now the error handler
With issue 3672 resolved, it is now unnecessary to introduce an utf-8b codec, since the utf-8 codec will properly report errors for all byte sequences invalid in UTF-8, including lone surrogates. Therefore, utf-8b can be implemented solely through the error handler. Glenn Linderman suggested that the name "python-escape" is not very descriptive, so I've changed the name to "utf8b". I've updated the PEP accordingly. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 383 and Tahoe [was: GUI libraries]
Zooko O'Whielacronx writes: > However, it is moot because Tahoe is not a new system. It is currently > at v1.4.1, has a strong policy of backwards-compatibility, and already > has lots of data, lots of users, and programmers building on top of > it. Cool! Question: is there a way to negotiate versions, or better yet, features? > I see I'm not explaining the Tahoe requirements clearly. It's probably > that I'm not understanding them clearly myself. Well, it's a high-dimensional problem. Keeping track of all the variables is hard. That's why something like PEP 383 can be important to you even though it's only a partial solution; it eliminates one variable. > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system > and then you inspect the files in the Tahoe filesystem, such as by > examining the web interface [1] or by running "tahoe ls", either of > which you could do either from the same machine where you ran "tahoe > cp" or from a different machine (which could be using any operating > system). We have the following requirements about what ends up in your > Tahoe directory after that cp -r. Whoa! Slow down! Where's "my" "Tahoe directory"? Do you mean the directory listing? A copy to whatever system I'm on? The bytes that the Tahoe host has just loaded into a network card buffer to tell me about it? The bytes on disk at the Tahoe host? You'll find it a lot easier to explain things if you adopt a precise, consistent terminology. > Requirement 1 (unicode): Each filename that you see needs to be valid > unicode What does "see" mean? In directory listings? Under what circumstances, if any, can what I see be different from what I get? > Requirement 2 (faithful if unicode): For each filename (byte string) > in your myfiles directory, My local myfiles directory, or my Tahoe myfiles directory? > if that bytestring is the valid encoding of some string in your > stated locale, Who stated the locale? How? Are you referring to what getfilesystemencoding returns? This is a "(unicode) string", right? > then the resulting filename in Tahoe is that (unicode) > string. Nobody ever doesn't want this, right? Well, maybe some > people don't want this sometimes, [...]. However, what's the > alternative? Guessing that their locale shouldn't be set to > latin-1 and instead decoding their bytes some other way? Sure. Emacsen do that, you know. Of course it's hard to guess something else if ISO-8859/1 is the preferred encoding, but it does happen. This probably cannot be done accurately enough for Tahoe, though. > It seems like we're not going to do better than > requirement 2 (faithful if unicode). > > Requirement 3 (no file left behind): For each filename (byte string) > in your myfiles directory, whether or not that byte string is the > valid encoding of anything in your stated locale, then that file will > be added into the Tahoe filesystem under *some* name (a good candidate > would be mojibake, e.g. decode the bytes with latin-1, but that is not > the only possibility). That's not even a possibility, actually. Technically, Latin-1 has a "hole" from U+0080 to U+009F. You need to add the C1 controls to fill in that gap. (I don't think it actually matters in practice, everybody seems to implement ISO-8859/1 as though it contained the control characters ... except when detecting encodings ... but it pays to be precise in these things ) > Now already we can say that these three requirements mean that there > can be collisions -- for example a directory could have two entries, > one of which is not a valid encoding in the locale, and whatever > unicode string we invent to name it with in order to satisfy > requirements 3 (no file left behind) and 1 (unicode) might happen to > be the same as the (correctly-encoded) name of the other file. This is false with rather high probability, but you need some extra structure to deal with it. First, claim the Unicode private planes for Tahoe. Then allocate characters from the private planes on demand as encountered, *including* such characters encountered in external file names to be stored in Tahoe *and* the surrogates used by PEP 383. "Display names" using these private characters would be valid Unicode, but not very useful. However, an algorithmically generated font (like the 4-hex-digit-square used to give a glyph to unknown code points in the BMP) could be used by those who care. Also store mappings from (system encoding, UTF-8b representation) to private char and back. For simplicity, that could be global on your server (IIRC, there are at least two private planes up there, so you'd need to run into almost 128Ki *unique* such characters to run out). I guess you'd be subject to a DOS attack where somebody decided to map all of 8-odd CNS characters into private space, and then write 8 files, each with a different 1-character name Note that Martin does *not* do this in P
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
2009/5/3 "Martin v. Löwis" : > With issue 3672 resolved, it is now unnecessary to introduce > an utf-8b codec, since the utf-8 codec will properly report errors > for all byte sequences invalid in UTF-8, including lone surrogates. > Therefore, utf-8b can be implemented solely through the error handler. That's even nicer. One minor detail though, in the sentence: "non-decodable bytes >128 will be represented as lone half surrogate" ">" should be ">=". -- Lino Mastrodomenico ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
Martin v. Löwis v.loewis.de> writes: > > Glenn Linderman suggested that the name "python-escape" is not very > descriptive, so I've changed the name to "utf8b". If the error handler is supposed to be used for codecs other than utf-8, perhaps it should renamed something more generic, e.g. "surrogate-escape"? Also, if utf8-b is not provided as a codec, will there be an easy way for user code to use the same encoding as the IO layer does? (e.g. os.fsdecode/os.fsencode)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] multi-with statement
(I still don't really have net access back after moving house - just chiming in briefly via my mobile) Anyway, I think there is one very good reason for NOT defining a multi- with statement in terms of an existing tuple: it gains us nothing except speed over contextlib.nested. The whole point of the new syntactic support is to execute each expression inside the context of the preceding managers. That requirement precludes the idea of using an intermediate tuple, since every expression would have to be evaluated before the tuple could be created. I'm still not 100% convinced the saving in indentation levels due to this change would be worth the increase in complexity and ambiguity though. -- Nick Coghlan, Brisbane, Australia On 03/05/2009, at 6:12 AM, Georg Brandl wrote: Fredrik Johansson schrieb: On Sat, May 2, 2009 at 9:01 PM, Georg Brandl wrote: Hi, this is just a short notice that Mattias Brändström and I have f inished a patch to implement the previously discussed and mostly warmly welcomed extension to with's syntax, allowing with A() as a, B() as b: to be written instead of with A() as a: with B() as b: I was hoping for the other syntax in order to be able to create a nested context in advance as a simple tuple: with A, B: pass context = A, B with context: pass (I.e. a tuple, or perhaps any iterable, would be a valid context manager.) I see; you want to construct your context manager programmatically and pass it to "with" without knowing what is in there. While this would be possible, we have to be aware that with this we would effectively change the context manager protocol, rather like the iterator protocol's __getitem__ alternate realization. This muddies the definition of a context manager. (The interesting thing is that you could already implement *that* version without any new syntactic support, by giving tuples an __enter__/ __exit__ method pair.) With the syntax in the patch, I will still have to implement a custom nesting context manager to do this, which sort of defeats the purpose. Not really. Having an unknown number of stacked context managers is not the purpose -- for that, I'd still say a custom nesting context manager is better, because it is also more explicit when created not at the "with" site. (You could even write it as a tuple subclass, if you like the tuple interface.) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On Sun, May 3, 2009 at 08:43, Antoine Pitrou wrote: > Also, if utf8-b is not provided as a codec, will there be an easy way for user > code to use the same encoding as the IO layer does? (e.g. > os.fsdecode/os.fsencode)? I like the idea of fsencode/fsdecode functions, but we need to be careful deciding what they accept and produce on Windows. I'd expect them to be identity functions, but then the difference in platform behavior suggests perhaps they should be in os.path. Unicode to Unicode on Windows would further mean fsencode wouldn't be useful for sending filenames over sockets, and "utf8" will be prone to exceptions on the very names we're trying to support right now. Is there an advantage to not providing the the "utf8b" behavior as a registered codec? -- Michael Urman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
> That's even nicer. One minor detail though, in the sentence: > > "non-decodable bytes >128 will be represented as lone half surrogate" > > ">" should be ">=". Thanks, fixed. Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
> If the error handler is supposed to be used for codecs other than utf-8, > perhaps it should renamed something more generic, e.g. "surrogate-escape"? Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - it's an algorithm based on 16-bit or 32-bit code points. > Also, if utf8-b is not provided as a codec, will there be an easy way for user > code to use the same encoding as the IO layer does? s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in fact, that's exactly what the IO layer does). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On Sun, May 3, 2009 at 10:39 AM, "Martin v. Löwis" wrote: > > If the error handler is supposed to be used for codecs other than utf-8, > > perhaps it should renamed something more generic, e.g. > "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm based on 16-bit or 32-bit code points. To me that lack of relationship with utf8 suggests that it should not be called utf8b... But I don't have any good suggestions. > > > Also, if utf8-b is not provided as a codec, will there be an easy way for > user > > code to use the same encoding as the IO layer does? > > s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in > fact, that's exactly what the IO layer does). > > Regards, > Martin > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/greg%40krypto.org > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
> > If the error handler is supposed to be used for codecs other than > utf-8, > > perhaps it should renamed something more generic, e.g. > "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm based on 16-bit or 32-bit code points. > > > To me that lack of relationship with utf8 suggests that it should not be > called utf8b Perhaps. However, giving it that name was Markus Kuhn's choice - and while it may be confusing, it's (IMO) useful to be consistent with this background. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 update: utf8b is now the error handler
On Sun, May 3, 2009 at 1:27 PM, "Martin v. Löwis" wrote: > > > If the error handler is supposed to be used for codecs other than > > utf-8, > > > perhaps it should renamed something more generic, e.g. > > "surrogate-escape"? > > > > Perhaps. However, utf-8b doesn't really have to do anything with > utf-8 - > > it's an algorithm based on 16-bit or 32-bit code points. > > > > > > To me that lack of relationship with utf8 suggests that it should not be > > called utf8b > > Perhaps. However, giving it that name was Markus Kuhn's choice - and > while it may be confusing, it's (IMO) useful to be consistent with this > background. > > Regards, > Martin > > Ah, right. My original searches for utf8b didn't turn up much but searching on his name turns some up. Good choice of name then. http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html http://bsittler.livejournal.com/10381.html http://hyperreal.org/~est/utf-8b/ -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] yield from?
2009/5/3 Greg Ewing : > Benjamin Peterson wrote: >> >> What's the status of yield from? There's still a small window open for >> a patch to be checked into 3.1's branch. I haven't been following the >> python-ideas threads, so I'm not sure if it's ready yet. > > The PEP itself seems to have settle down, and is > awaiting a verdict from Guido. Guido is now on vacation until the 18th, so I think this will have to be deferred until 2.7/3.2. > > The prototype implementation doesn't quite match > the PEP in some of the fine details yet. Also > it's for 2.6 rather than 3.x; someone with more > knowledge of 3.x internals would be better placed > than me to convert it. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 383 and GUI libraries
(sent only to python-dev, as I am not a subscriber of tahoe-dev) Zooko wrote: > [Tahoe] currently uses utf-8 for its internal storage (note: nothing to > do with reading or writing files from external sources -- only for > storing filenames in the decentralized storage system which is > accessed by Tahoe clients), and we can't start putting non-utf-8-valid > sequences in the "filename" slot because other Tahoe clients would > then get a UnicodeDecodeError exception when trying to read those > directories. So what do you do when someone has an existing file whose name is supposed to be in utf-8, but whose actual bytes are not valid utf-8? If you have somehow solved that problem, then you're already done -- the PEP's encoding is a no-op on anything that isn't already invalid unicode. If you have not solved that problem, then those clients will already be getting a UnicodeDecodeError; all the PEP does is make it at least possible for them to recover. ... > Requirement 1 (unicode): Each filename that you see needs to be valid > unicode (it is stored internally in utf-8). (repeating) What does Tahoe do if this is violated? Do you throw an exception right there and not let them copy the file to tahoe? If so, then that same error correction means that utf8b will never differ from utf-8, and you have nothing to worry about. > Requirement 2 (faithful if unicode): Doesn't the PEP meet this? > Requirement 3 (no file left behind): Doesn't the PEP also meet this? I thought the concern was just that the name used would not be valid unicode, unless the original name was itself valid unicode. > Possible Requirement 4 (faithful bytes if not unicode, a.k.a. > "round-tripping"): Doesn't the PEP also support this? (Only) the invalid bytes get escaped and therefore must be unescaped, but the escapement is reversible. > 3. (handling collisions) In either case 2.a or 2.b the resulting > unicode string may already be present in the directory. This collision is what the use of half-surrogates (as the escape characters) avoids. Such collisions can't be present unless the data was invalid unicode, in which case it was the result of an escapement (unless something other than python is creating new invalid filenames). -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com