Re: [Python-Dev] yield from?

2009-05-03 Thread Greg Ewing

Benjamin Peterson wrote:

What's the status of yield from? There's still a small window open for
a patch to be checked into 3.1's branch. I haven't been following the
python-ideas threads, so I'm not sure if it's ready yet.


The PEP itself seems to have settle down, and is
awaiting a verdict from Guido.

The prototype implementation doesn't quite match
the PEP in some of the fine details yet. Also
it's for 2.6 rather than 3.x; someone with more
knowledge of 3.x internals would be better placed
than me to convert it.

--
Greg


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
With issue 3672 resolved, it is now unnecessary to introduce
an utf-8b codec, since the utf-8 codec will properly report errors
for all byte sequences invalid in UTF-8, including lone surrogates.
Therefore, utf-8b can be implemented solely through the error handler.

Glenn Linderman suggested that the name "python-escape" is not very
descriptive, so I've changed the name to "utf8b".

I've updated the PEP accordingly.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 383 and Tahoe [was: GUI libraries]

2009-05-03 Thread Stephen J. Turnbull
Zooko O'Whielacronx writes:

 > However, it is moot because Tahoe is not a new system. It is currently
 > at v1.4.1, has a strong policy of backwards-compatibility, and already
 > has lots of data, lots of users, and programmers building on top of
 > it.

Cool!

Question: is there a way to negotiate versions, or better yet, features?

 > I see I'm not explaining the Tahoe requirements clearly. It's probably
 > that I'm not understanding them clearly myself.

Well, it's a high-dimensional problem.  Keeping track of all the
variables is hard.  That's why something like PEP 383 can be important
to you even though it's only a partial solution; it eliminates one
variable.

 > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system
 > and then you inspect the files in the Tahoe filesystem, such as by
 > examining the web interface [1] or by running "tahoe ls", either of
 > which you could do either from the same machine where you ran "tahoe
 > cp" or from a different machine (which could be using any operating
 > system). We have the following requirements about what ends up in your
 > Tahoe directory after that cp -r.

Whoa! Slow down!  Where's "my" "Tahoe directory"?  Do you mean the
directory listing?  A copy to whatever system I'm on?  The bytes that
the Tahoe host has just loaded into a network card buffer to tell me
about it?  The bytes on disk at the Tahoe host?  You'll find it a lot
easier to explain things if you adopt a precise, consistent terminology.

 > Requirement 1 (unicode):  Each filename that you see needs to be valid
 > unicode

What does "see" mean?  In directory listings?  Under what
circumstances, if any, can what I see be different from what I get?

 > Requirement 2 (faithful if unicode):  For each filename (byte string)
 > in your myfiles directory,

My local myfiles directory, or my Tahoe myfiles directory?

 > if that bytestring is the valid encoding of some string in your
 > stated locale,

Who stated the locale?  How?  Are you referring to what
getfilesystemencoding returns?  This is a "(unicode) string", right?

 > then the resulting filename in Tahoe is that (unicode)
 > string. Nobody ever doesn't want this, right?  Well, maybe some
 > people don't want this sometimes, [...]. However, what's the
 > alternative?  Guessing that their locale shouldn't be set to
 > latin-1 and instead decoding their bytes some other way?

Sure.  Emacsen do that, you know.  Of course it's hard to guess
something else if ISO-8859/1 is the preferred encoding, but it does
happen.  This probably cannot be done accurately enough for Tahoe,
though.

 > It seems like we're not going to do better than
 > requirement 2 (faithful if unicode).
 > 
 > Requirement 3 (no file left behind):  For each filename (byte string)
 > in your myfiles directory, whether or not that byte string is the
 > valid encoding of anything in your stated locale, then that file will
 > be added into the Tahoe filesystem under *some* name (a good candidate
 > would be mojibake, e.g. decode the bytes with latin-1, but that is not
 > the only possibility).

That's not even a possibility, actually.  Technically, Latin-1 has a
"hole" from U+0080 to U+009F.  You need to add the C1 controls to fill
in that gap.  (I don't think it actually matters in practice,
everybody seems to implement ISO-8859/1 as though it contained the
control characters ... except when detecting encodings ... but it pays
to be precise in these things )

 > Now already we can say that these three requirements mean that there
 > can be collisions -- for example a directory could have two entries,
 > one of which is not a valid encoding in the locale, and whatever
 > unicode string we invent to name it with in order to satisfy
 > requirements 3 (no file left behind) and 1 (unicode) might happen to
 > be the same as the (correctly-encoded) name of the other file.

This is false with rather high probability, but you need some extra
structure to deal with it.  First, claim the Unicode private planes
for Tahoe.  Then allocate characters from the private planes on demand
as encountered, *including* such characters encountered in external
file names to be stored in Tahoe *and* the surrogates used by PEP
383.  "Display names" using these private characters would be valid
Unicode, but not very useful.  However, an algorithmically generated
font (like the 4-hex-digit-square used to give a glyph to unknown code
points in the BMP) could be used by those who care.

Also store mappings from (system encoding, UTF-8b representation) to
private char and back.  For simplicity, that could be global on your
server (IIRC, there are at least two private planes up there, so you'd
need to run into almost 128Ki *unique* such characters to run out).

I guess you'd be subject to a DOS attack where somebody decided to map
all of 8-odd CNS characters into private space, and then write
8 files, each with a different 1-character name 

Note that Martin does *not* do this in P

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Lino Mastrodomenico
2009/5/3 "Martin v. Löwis" :
> With issue 3672 resolved, it is now unnecessary to introduce
> an utf-8b codec, since the utf-8 codec will properly report errors
> for all byte sequences invalid in UTF-8, including lone surrogates.
> Therefore, utf-8b can be implemented solely through the error handler.

That's even nicer. One minor detail though, in the sentence:

"non-decodable bytes >128 will be represented as lone half surrogate"

">" should be ">=".

-- 
Lino Mastrodomenico
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> 
> Glenn Linderman suggested that the name "python-escape" is not very
> descriptive, so I've changed the name to "utf8b".

If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. "surrogate-escape"?

Also, if utf8-b is not provided as a codec, will there be an easy way for user
code to use the same encoding as the IO layer does? (e.g. 
os.fsdecode/os.fsencode)?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] multi-with statement

2009-05-03 Thread Nick Coghlan
(I still don't really have net access back after moving house - just  
chiming in briefly via my mobile)


Anyway, I think there is one very good reason for NOT defining a multi- 
with statement in terms of an existing tuple: it gains us nothing  
except speed over contextlib.nested. The whole point of the new  
syntactic support is to execute each expression inside the context of  
the preceding managers. That requirement precludes the idea of using  
an intermediate tuple, since every expression would have to be  
evaluated before the tuple could be created.


I'm still not 100% convinced the saving in indentation levels due to  
this change would be worth the increase in complexity and ambiguity  
though.


--
Nick Coghlan, Brisbane, Australia

On 03/05/2009, at 6:12 AM, Georg Brandl  wrote:


Fredrik Johansson schrieb:
On Sat, May 2, 2009 at 9:01 PM, Georg Brandl   
wrote:

Hi,

this is just a short notice that Mattias Brändström and I have f 
inished a
patch to implement the previously discussed and mostly warmly  
welcomed

extension to with's syntax, allowing

 with A() as a, B() as b:

to be written instead of

 with A() as a:
 with B() as b:



I was hoping for the other syntax in order to be able to create a
nested context in advance as a simple tuple:

with A, B:
   pass

context = A, B
with context:
   pass

(I.e. a tuple, or perhaps any iterable, would be a valid context  
manager.)


I see; you want to construct your context manager programmatically  
and pass

it to "with" without knowing what is in there.

While this would be possible, we have to be aware that with this we  
would
effectively change the context manager protocol, rather like the  
iterator
protocol's __getitem__ alternate realization.  This muddies the  
definition

of a context manager.

(The interesting thing is that you could already implement *that*  
version
without any new syntactic support, by giving tuples an __enter__/ 
__exit__

method pair.)


With the syntax in the patch, I will still have to implement a custom
nesting context manager to do this, which sort of defeats the  
purpose.


Not really.  Having an unknown number of stacked context managers is  
not
the purpose -- for that, I'd still say a custom nesting context  
manager
is better, because it is also more explicit when created not at the  
"with"
site.  (You could even write it as a tuple subclass, if you like the  
tuple

interface.)

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no  
less.
Four shall be the number of spaces thou shalt indent, and the number  
of thy
indenting shall be four. Eight shalt thou not indent, nor either  
indent thou

two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Michael Urman
On Sun, May 3, 2009 at 08:43, Antoine Pitrou  wrote:
> Also, if utf8-b is not provided as a codec, will there be an easy way for user
> code to use the same encoding as the IO layer does? (e.g.
> os.fsdecode/os.fsencode)?

I like the idea of fsencode/fsdecode functions, but we need to be
careful deciding what they accept and produce on Windows. I'd expect
them to be identity functions, but then the difference in platform
behavior suggests perhaps they should be in os.path.

Unicode to Unicode on Windows would further mean fsencode wouldn't be
useful for sending filenames over sockets, and "utf8" will be prone to
exceptions on the very names we're trying to support right now. Is
there an advantage to not providing the the "utf8b" behavior as a
registered codec?

-- 
Michael Urman
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
> That's even nicer. One minor detail though, in the sentence:
> 
> "non-decodable bytes >128 will be represented as lone half surrogate"
> 
> ">" should be ">=".

Thanks, fixed.

Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
> If the error handler is supposed to be used for codecs other than utf-8,
> perhaps it should renamed something more generic, e.g. "surrogate-escape"?

Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on 16-bit or 32-bit code points.

> Also, if utf8-b is not provided as a codec, will there be an easy way for user
> code to use the same encoding as the IO layer does? 

s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in
fact, that's exactly what the IO layer does).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Gregory P. Smith
On Sun, May 3, 2009 at 10:39 AM, "Martin v. Löwis" wrote:

> > If the error handler is supposed to be used for codecs other than utf-8,
> > perhaps it should renamed something more generic, e.g.
> "surrogate-escape"?
>
> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
> it's an algorithm based on 16-bit or 32-bit code points.


To me that lack of relationship with utf8 suggests that it should not be
called utf8b...  But I don't have any good suggestions.


>
> > Also, if utf8-b is not provided as a codec, will there be an easy way for
> user
> > code to use the same encoding as the IO layer does?
>
> s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in
> fact, that's exactly what the IO layer does).
>
> Regards,
> Martin
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Martin v. Löwis
> > If the error handler is supposed to be used for codecs other than
> utf-8,
> > perhaps it should renamed something more generic, e.g.
> "surrogate-escape"?
> 
> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
> it's an algorithm based on 16-bit or 32-bit code points.
> 
> 
> To me that lack of relationship with utf8 suggests that it should not be
> called utf8b

Perhaps. However, giving it that name was Markus Kuhn's choice - and
while it may be confusing, it's (IMO) useful to be consistent with this
background.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-03 Thread Gregory P. Smith
On Sun, May 3, 2009 at 1:27 PM, "Martin v. Löwis" wrote:

> > > If the error handler is supposed to be used for codecs other than
> > utf-8,
> > > perhaps it should renamed something more generic, e.g.
> > "surrogate-escape"?
> >
> > Perhaps. However, utf-8b doesn't really have to do anything with
> utf-8 -
> > it's an algorithm based on 16-bit or 32-bit code points.
> >
> >
> > To me that lack of relationship with utf8 suggests that it should not be
> > called utf8b
>
> Perhaps. However, giving it that name was Markus Kuhn's choice - and
> while it may be confusing, it's (IMO) useful to be consistent with this
> background.
>
> Regards,
> Martin
>
>
Ah, right.  My original searches for utf8b didn't turn up much but searching
on his name turns some up.  Good choice of name then.

 http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html
 http://bsittler.livejournal.com/10381.html
 http://hyperreal.org/~est/utf-8b/

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] yield from?

2009-05-03 Thread Benjamin Peterson
2009/5/3 Greg Ewing :
> Benjamin Peterson wrote:
>>
>> What's the status of yield from? There's still a small window open for
>> a patch to be checked into 3.1's branch. I haven't been following the
>> python-ideas threads, so I'm not sure if it's ready yet.
>
> The PEP itself seems to have settle down, and is
> awaiting a verdict from Guido.

Guido is now on vacation until the 18th, so I think this will have to
be deferred until 2.7/3.2.

>
> The prototype implementation doesn't quite match
> the PEP in some of the fine details yet. Also
> it's for 2.6 rather than 3.x; someone with more
> knowledge of 3.x internals would be better placed
> than me to convert it.




-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 383 and GUI libraries

2009-05-03 Thread Jim Jewett
(sent only to python-dev, as I am not a subscriber of tahoe-dev)

Zooko wrote:

> [Tahoe] currently uses utf-8 for its internal storage (note: nothing to
> do with reading or writing files from external sources -- only for
> storing filenames in the decentralized storage system which is
> accessed by Tahoe clients), and we can't start putting non-utf-8-valid
> sequences in the "filename" slot because other Tahoe clients would
> then get a UnicodeDecodeError exception when trying to read those
> directories.

So what do you do when someone has an existing file whose name is
supposed to be in utf-8, but whose actual bytes are not valid utf-8?

If you have somehow solved that problem, then you're already done --
the PEP's encoding is a no-op on anything that isn't already invalid
unicode.

If you have not solved that problem, then those clients will already
be getting a UnicodeDecodeError; all the PEP does is make it at least
possible for them to recover.

...

> Requirement 1 (unicode):  Each filename that you see needs to be valid
> unicode (it is stored internally in utf-8).

(repeating) What does Tahoe do if this is violated?  Do you throw an
exception right there and not let them copy the file to tahoe?  If so,
then that same error correction means that utf8b will never differ
from utf-8, and you have nothing to worry about.

> Requirement 2 (faithful if unicode):

Doesn't the PEP meet this?

> Requirement 3 (no file left behind):

Doesn't the PEP also meet this?  I thought the concern was just that
the name used would not be valid unicode, unless the original name was
itself valid unicode.

> Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
> "round-tripping"):

Doesn't the PEP also support this?  (Only) the invalid bytes get
escaped and therefore must be unescaped, but the escapement is
reversible.

> 3. (handling collisions)  In either case 2.a or 2.b the resulting
> unicode string may already be present in the directory.

This collision is what the use of half-surrogates (as the escape
characters) avoids.  Such collisions can't be present unless the data
was invalid unicode, in which case it was the result of an escapement
(unless something other than python is creating new invalid
filenames).

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com