Re: [Python-Dev] BLOBs in Pg

2009-04-10 Thread Sylvain Thénault
On 09 avril 14:05, Steve Holden wrote:
> Oleg Broytmann wrote:
> > On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
> >> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
> >> PostgreSQL has no real support for BLOBs.
> > 
> >I think it has - BYTEA data type.
> > 
> But the Python DB adapters appears to require some fairly hairy escaping
> of the data to make it usable with the cursor execute() method. IMHO you
> shouldn't have to escape data that is passed for insertion via a
> parameterized query.

can't you simply use dbmodule.Binary to do the job?

-- 
Sylvain Thénault   LOGILAB, Paris (France)
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:   http://www.logilab.fr/services
CubicWeb, the semantic web framework:http://www.cubicweb.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Nick Coghlan
gl...@divmod.com wrote:
> On 03:21 am, ncogh...@gmail.com wrote:
>> Given that json is a wire protocol, that sounds like the right approach
>> for json as well. Once bytes-everywhere works, then a text API can be
>> built on top of it, but it is difficult to build a bytes API on top of a
>> text one.
> 
> I wish I could agree, but JSON isn't really a wire protocol.  According
> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the
> serialization of structured data".  There are some notes about encoding,
> but it is very clearly described in terms of unicode code points.

Ah, my apologies - if the RFC defines things such that the native format
is Unicode, then yes, the appropriate Python 3.x data type for the base
implementation would indeed be strings.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Nick Coghlan
Guido van Rossum wrote:
> Just to add some skepticism, has anyone done any kind of
> instrumentation of bzr start-up behavior?  IIRC every time I was asked
> to reduce the start-up cost of some Python app, the cause was too many
> imports, and the solution was either to speed up import itself (.pyc
> files were the first thing ever that came out of that -- importing
> from a single .zip file is one of the more recent tricks) or to reduce
> the number of modules imported at start-up (or both :-). Heavy-weight
> frameworks are usually the root cause, but usually there's nothing
> that can be done about that by the time you've reached this point. So,
> amen on the good luck, but please start with a bit of analysis.

This problem (slow application startup times due to too many imports at
startup, which can in turn can be due to top level imports for library
or framework functionality that a given application doesn't actually
use) is actually the main reason I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.

Providing a clean API and implementation for that functionality is a
pretty tough nut to crack though, so I'm not holding my breath...

Cheers,
Nick.

P.S. It's only an occasional fairly idle wish for me though, or I'd have
at least tried to come up with something myself by now.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote:

> Just to add some skepticism, has anyone done any kind of
> instrumentation of bzr start-up behavior?

We sure have. 'bzr --profile-imports' reports on the time to import
different modules (both cumulative and individually).

We have a lazy module loader that allows us to defer loading modules we
might not use (though if they are needed we are in fact going to pay for
loading them eventually).

We monkeypatch the standard library where modules we want are
unreasonably expensive to import (for instance by making a regex we
wouldn't use be lazy compiled rather than compiled at import time).

>   IIRC every time I was asked
> to reduce the start-up cost of some Python app, the cause was too many
> imports, and the solution was either to speed up import itself (.pyc
> files were the first thing ever that came out of that -- importing
> from a single .zip file is one of the more recent tricks) or to reduce
> the number of modules imported at start-up (or both :-). Heavy-weight
> frameworks are usually the root cause, but usually there's nothing
> that can be done about that by the time you've reached this point. So,
> amen on the good luck, but please start with a bit of analysis.

Certainly, import time is part of it:
robe...@lifeless-64:~$ python -m timeit -s 'import sys;  import
bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors"
10 loops, best of 3: 18.7 msec per loop

(errors.py is 3027 lines long with 347 exception classes).

We've also looked lower - python does a lot of stat operations search
for imports and determining if the pyc is up to date; these appear to
only really matter on cold-cache imports (but they matter a lot then);
in hot-cache situations they are insignificant.

Uhm, there's probably more - but I just wanted to note that we have done
quite a bit of analysis. I think a large chunk of our problem is having
too much code loaded when only a small fraction will be used in any one
operation. Consider importing bzrlib errors - 10% of the startup time
for 'bzr help'. In any operation only a few of those exceptions will be
used - and typically 0.

-Rob


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Antoine Pitrou
 divmod.com> writes:
> 
> In email's case this is true, but in JSON's case it's not.  JSON is a 
> format defined as a sequence of code points; MIME is defined as a 
> sequence of octets.

Another to look at it is that JSON is a subset of Javascript, and as such is
text rather than bytes.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Antoine Pitrou
Robert Collins  canonical.com> writes:
> 
> (errors.py is 3027 lines long with 347 exception classes).

347 exception classes? Perhaps your framework is over-engineered.

Similarly, when using a heavy Web framework, reloading a Web app can take
several seconds... but I won't blame Python for that.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote:
> Robert Collins  canonical.com> writes:
> > 
> > (errors.py is 3027 lines long with 347 exception classes).
> 
> 347 exception classes? Perhaps your framework is over-engineered.
> 
> Similarly, when using a heavy Web framework, reloading a Web app can take
> several seconds... but I won't blame Python for that.

Well, we've added exceptions as we needed them. This isn't much
different to errno in C programs; the errno range has expanded as people
have wanted to signal that specific situations have arisen. The key
thing for us is to have both something that can be caught (for library
users of bzrlib) and something that can be formatted with variable
substitution (for displaying to users). If there are better ways to
approach this in python than what we've done, that would be great.

-Rob


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Paul Moore
2009/4/10 Nick Coghlan :
> gl...@divmod.com wrote:
>> On 03:21 am, ncogh...@gmail.com wrote:
>>> Given that json is a wire protocol, that sounds like the right approach
>>> for json as well. Once bytes-everywhere works, then a text API can be
>>> built on top of it, but it is difficult to build a bytes API on top of a
>>> text one.
>>
>> I wish I could agree, but JSON isn't really a wire protocol.  According
>> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the
>> serialization of structured data".  There are some notes about encoding,
>> but it is very clearly described in terms of unicode code points.
>
> Ah, my apologies - if the RFC defines things such that the native format
> is Unicode, then yes, the appropriate Python 3.x data type for the base
> implementation would indeed be strings.

Indeed, the RFC seems to clearly imply that loads should take a
Unicode string, dumps should produce one, and load/dump should work in
terms of text files (not byte files).

On the other hand, further down in the document:

"""
3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.
"""

This is at best confused (in my utterly non-expert opinion :-)) as
Unicode isn't an encoding...

I would guess that what the RFC is trying to say is that JSON is text
(Unicode) and where a byte stream purporting to be JSON is encountered
without a defined encoding, this is how to guess one.

That implies that loads can/should also allow bytes as input, applying
the given algorithm to guess an encoding. And similarly load
can/should accept a byte stream, on the same basis. (There's no need
to allow the possibility of accepting bytes plus an encoding - in that
case the user should decode the bytes before passing Unicode to the
JSON module).

An alternative might be for the JSON module to register a special
encoding ('JSON-guess'?) which captures the rules here. Then there's
no need for special bytes parameter handling.

Of course, this is all from a native English speaker, who therefore
has no idea of the real life issues involved in Unicode :-)

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Martin v. Löwis
>> In email's case this is true, but in JSON's case it's not.  JSON is a 
>> format defined as a sequence of code points; MIME is defined as a 
>> sequence of octets.
> 
> Another to look at it is that JSON is a subset of Javascript, and as such is
> text rather than bytes.

I don't think this can be approached from a theoretical point of view.
Instead, what matters is how users want to use it.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] decorator module in stdlib?

2009-04-10 Thread Michael Foord

Guido van Rossum wrote:

On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato
 wrote:
  

Then perhaps you misunderstand the goal of the decorator module.
The raison d'etre of the module is to PRESERVE the signature:
update_wrapper unfortunately *changes* it.

When confronted with a library which I do not not know, I often run
over it pydoc, or sphinx, or a custom made documentation tool, to extract the
signature of functions.



Ah, I see. Personally I rarely trust automatically extracted
documentation -- too often in my experience it is out of date or
simply absent. Extracting the signatures in theory wouldn't lie, but
in practice I still wouldn't trust it -- not only because of what
decorators might or might not do, but because it might still be
misleading. Call me old-fashioned, but I prefer to read the source
code.
  


If you auto-generate API documentation by introspection (which we do at 
Resolver Systems) then preserving signatures can also be important. 
Interactive use (support for help), and more straightforward tracebacks 
in the event of usage errors are other reasons to want to preserve 
signatures and function name.



 For instance, if I see a method
  

get_user(self, username) I have a good hint about what it is supposed
to do. But if the library (say a web framework) uses non signature-preserving
decorators, my documentation tool says to me that there is function
get_user(*args, **kwargs) which frankly is not enough [this is the
optimistic case, when the author of the decorator has taken care
to preserve the name of the original function].



But seeing the decorator is often essential for understanding what
goes on! Even if the decorator preserves the signature (in truth or
according inspect), many decorators *do* something, and it's important
to know how a function is decorated. For example, I work a lot with a
small internal framework at Google whose decorators can raise
exceptions and set instance variables; they also help me understand
under which conditions a method can be called.
  


Having methods renamed to 'wrapped' and their signature changed to 
*args, **kwargs may tell you there *is* a decorator but doesn't give you 
any useful information about what it does. If you look at the code then 
the decorator is obvious (whether or not it mangles the method)...

[+1]

But I feel strongly about
the possibility of being able to preserve (not change!) the function
signature.



That could be added to functools if enough people want it.

  


+1

Michael


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 2.6.2 final

2009-04-10 Thread Barry Warsaw
I wanted to cut Python 2.6.2 final tonight, but for family reasons I  
won't be able to do so until Monday.  Please be conservative in any  
commits to the 2.6 branch between now and then.


bugs.python.org is apparently down right now, but I set issue 5724 to  
release blocker for 2.6.2.  This is waiting for input from Mark  
Dickinson, and it relates to test_cmath failing on Solaris 10.  If  
Mark fixes that, he's welcome to commit it, otherwise I will remove  
the release blocker tag on the issue and release 2.6.2 anyway.


Plan on me tagging 2.6.2 final Sunday evening.

Cheers,
-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evaluated cmake as an autoconf replacement

2009-04-10 Thread Bill Hoffman

Neil Hodgson wrote:

   cmake does not produce relative paths in its generated make and
project files. There is an option CMAKE_USE_RELATIVE_PATHS which
appears to do this but the documentation says:

"""This option does not work for more complicated projects, and
relative paths are used when possible. In general, it is not possible
to move CMake generated makefiles to a different location regardless
of the value of this variable."""

   This means that generated Visual Studio project files will not work
for other people unless a particular absolute build location is
specified for everyone which will not suit most. Each person that
wants to build Python will have to run cmake before starting Visual
Studio thus increasing the prerequisites.



This is true.  CMake does not generate stand alone transferable 
projects. CMake must be installed on the machine where the compilation 
is done.  CMake will automatically re-run if any of the inputs are 
changed, and have visual studio re-load the project, and CMake can be 
used for simple cross platform commands like file copy and and other 
operations so that the build files do not depend on shell commands or 
anything system specific.


-Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] decorator module in stdlib?

2009-04-10 Thread Nick Coghlan
Guido van Rossum wrote:
> On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato
>> But I feel strongly about
>> the possibility of being able to preserve (not change!) the function
>> signature.
> 
> That could be added to functools if enough people want it.

No objection in principle here - it's just hard to do cleanly without
PEP 362's __signature__ attribute to underpin it. Without that as a
basis, I expect you'd end up being forced to do something similar to
what Michele does in the decorator module - inspect the function being
wrapped and then use exec to generate a wrapper with a matching signature.

Another nice introspection enhancement might be to give class and
function objects writable __file__ and __line__ attributes (initially
set appropriately by the compiler) and have the inspect modules use
those when they're available. Then functools.update_wrapper() could be
adjusted to copy those attributes, meaning that the wrapper function
would point back to the original (decorated) function for the source
code, rather than to the definition of the wrapper (note that the actual
wrapper code could still be found by looking at the metadata on the
function's __code__ attribute).

Unfortunately-ideas-aren't-working-code'ly,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Peter Otten
John Arbash Meinel wrote:

> Not as big of a difference as I thought it would be... But I bet if
> there was a way to put the random shuffle in the inner loop, so you
> weren't accessing the same identical 25k keys internally, you might get
> more interesting results.

You can prepare a few random samples during startup:

$ python -m timeit -s"from random import sample; d =
dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000)
for i in range(200)]).next" "for x in nextrange(): d.get(x)"
10 loops, best of 3: 20.2 msec per loop

To put it into perspective:
 
$ python -m timeit -s"d = dict.fromkeys(xrange(10**7)); nextrange =
iter([range(25000)]*200).next" "for x in nextrange(): d.get(x)"
100 loops, best of 3: 10.9 msec per loop

Peter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Toshio Kuratomi
Robert Collins wrote:

> Certainly, import time is part of it:
> robe...@lifeless-64:~$ python -m timeit -s 'import sys;  import
> bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors"
> 10 loops, best of 3: 18.7 msec per loop
> 
> (errors.py is 3027 lines long with 347 exception classes).
> 
> We've also looked lower - python does a lot of stat operations search
> for imports and determining if the pyc is up to date; these appear to
> only really matter on cold-cache imports (but they matter a lot then);
> in hot-cache situations they are insignificant.
> 
Tarek, Georg, and I talked about a way to do both multi-version and
speedup of this exact problem with import in the future at pycon.  I had
to leave before the hackfest got started, though, so I don't know where
the idea went from there.  Tarek, did this idea progress any?

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread James Y Knight

On Apr 9, 2009, at 10:38 PM, Barry Warsaw wrote:
So, what I'm really asking is this.  Let's say you agree that there  
are use cases for accessing a header value as either the raw encoded  
bytes or the decoded unicode.


As I said in the thread having nearly the same exact discussion on web- 
sig, except about WSGI headers...



What should this return:

>>> message['Subject']

The raw bytes or the decoded unicode?


Until you write a parser for every header, you simply cannot decode to  
unicode. The only sane choices are:

1) raw bytes
2) parsed structured data

There's no "decoded to unicode but not parsed" option: that's doing  
things in the wrong order. If you RFC2047-decode the header before  
doing tokenization and parsing, you will just have a *broken*  
implementation.


Here's an example where it matters. If you decode the RFC2047 part  
before parsing, you'd decide that there's two recipients to the  
message. There aren't. ", " is the display-name of  
"act...@example.com", not a second recipient.


  To: =?UTF-8?B?PGJyb2tlbkBleGFtcGxlLmNvbT4sIA==?= 

Here's a quote from RFC2047:
NOTE: Decoding and display of encoded-words occurs *after* a  
structured field body is parsed into tokens. It is therefore  
possible to hide 'special' characters in encoded-words which, when  
displayed, will be indistinguishable from 'special' characters in  
the surrounding text. For this and other reasons, it is NOT  
generally possible to translate a message header containing 'encoded- 
word's to an unencoded form which can be parsed by an RFC 822 mail  
reader.

And another quote for good measure:
(2) Any header field not defined as '*text' should be parsed  
according to the syntax rules for that header field. However, any  
'word' that appears within a 'phrase' should be treated as an  
'encoded-word' if it meets the syntax rules in section 2. Otherwise  
it should be treated as an ordinary 'word'.



Now, I suppose there's also a third possibility:
3) US-ASCII-only strings, unmolested except for doing  
a .decode('ascii'). That'll give you a string all right, but it's  
really just cheating. It's not actually a text string in any  
meaningful sense.


(in all this I'm assuming your question is not about the "Subject"  
header in particular; that is of course just unstructured text so the  
parse step doesn't actually do anything...).


James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Stephen J. Turnbull
Paul Moore writes:

 > On the other hand, further down in the document:
 > 
 > """
 > 3.  Encoding
 > 
 >JSON text SHALL be encoded in Unicode.  The default encoding is
 >UTF-8.
 > 
 >Since the first two characters of a JSON text will always be ASCII
 >characters [RFC0020], it is possible to determine whether an octet
 >stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
 >at the pattern of nulls in the first four octets.
 > """
 > 
 > This is at best confused (in my utterly non-expert opinion :-)) as
 > Unicode isn't an encoding...

The word "encoding" (by itself) does not have a standard definition
AFAIK.  However, since Unicode *is* a "coded character set" (plus a
bunch of hairy usage rules), there's nothing wrong with saying "text
is encoded in Unicode".  The RFC 2130 and Unicode TR#17 taxonomies are
annoying verbose and pedantic to say the least.

So what is being said there (in UTR#17 terminology) is

(1) JSON is *text*, that is, a sequence of characters.
(2) The abstract repertoire and coded character set are defined by the
Unicode standard.
(3) The default transfer encoding syntax is UTF-8.

 > That implies that loads can/should also allow bytes as input, applying
 > the given algorithm to guess an encoding.

It's not a guess, unless the data stream is corrupt---or nonconforming.

But it should not be the JSON package's responsibility to deal with
corruption or non-conformance (eg, ISO-8859-15-encoded programs).
That's the whole point of specifying the coded character set in the
standard the first place.  I think it's a bad idea for any of the core
JSON API to accept or produce bytes in any language that provides a
Unicode string type.

That doesn't mean Python's module shouldn't provide convenience
functions to read and write JSON serialized as UTF-8 (in fact, that
*should* be done, IMO) and/or other UTFs (I'm not so happy about
that).  But those who write programs using them should not report bugs
until they've checked out and eliminated the possibility of an
encoding screwup!

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Bob Ippolito
On Fri, Apr 10, 2009 at 8:38 AM, Stephen J. Turnbull  wrote:
> Paul Moore writes:
>
>  > On the other hand, further down in the document:
>  >
>  > """
>  > 3.  Encoding
>  >
>  >    JSON text SHALL be encoded in Unicode.  The default encoding is
>  >    UTF-8.
>  >
>  >    Since the first two characters of a JSON text will always be ASCII
>  >    characters [RFC0020], it is possible to determine whether an octet
>  >    stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
>  >    at the pattern of nulls in the first four octets.
>  > """
>  >
>  > This is at best confused (in my utterly non-expert opinion :-)) as
>  > Unicode isn't an encoding...
>
> The word "encoding" (by itself) does not have a standard definition
> AFAIK.  However, since Unicode *is* a "coded character set" (plus a
> bunch of hairy usage rules), there's nothing wrong with saying "text
> is encoded in Unicode".  The RFC 2130 and Unicode TR#17 taxonomies are
> annoying verbose and pedantic to say the least.
>
> So what is being said there (in UTR#17 terminology) is
>
> (1) JSON is *text*, that is, a sequence of characters.
> (2) The abstract repertoire and coded character set are defined by the
>    Unicode standard.
> (3) The default transfer encoding syntax is UTF-8.
>
>  > That implies that loads can/should also allow bytes as input, applying
>  > the given algorithm to guess an encoding.
>
> It's not a guess, unless the data stream is corrupt---or nonconforming.
>
> But it should not be the JSON package's responsibility to deal with
> corruption or non-conformance (eg, ISO-8859-15-encoded programs).
> That's the whole point of specifying the coded character set in the
> standard the first place.  I think it's a bad idea for any of the core
> JSON API to accept or produce bytes in any language that provides a
> Unicode string type.
>
> That doesn't mean Python's module shouldn't provide convenience
> functions to read and write JSON serialized as UTF-8 (in fact, that
> *should* be done, IMO) and/or other UTFs (I'm not so happy about
> that).  But those who write programs using them should not report bugs
> until they've checked out and eliminated the possibility of an
> encoding screwup!

The current implementation doesn't do any encoding guesswork and I
have no intention to allow that as a feature. The input must be
unicode, UTF-8 bytes, or an encoding must be specified.

Personally most of experience with JSON is as a wire protocol and thus
bytes, so the obvious function to encode json should do that. There
probably should be another function to get unicode output, but nobody
has ever asked for that in the Python 2.x version. They either want
the default behavior (encoding as ASCII str which can be used as
unicode due to implementation details of Python 2.x) or encoding as a
more compact UTF-8 str (without escaping non-ASCII code points).
Perhaps Python 3 users would ask for a unicode output when decoding
though.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Martin v. Löwis
> (3) The default transfer encoding syntax is UTF-8.

Notice that the RFC is partially irrelevant. It only applies
to the application/json mime type, and JSON is used in various
other protocols, using various other encodings.

> I think it's a bad idea for any of the core
> JSON API to accept or produce bytes in any language that provides a
> Unicode string type.

So how do you integrate the encoding detection that the RFC suggests
to be done?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

2009-04-10 Thread Bill Janssen
Barry Warsaw  wrote:

> In that case, we really need the
> bytes-in-bytes-out-bytes-in-the-chewy-
> center API first, and build things on top of that.

Yep.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote:


On 02:38 am, ba...@python.org wrote:
So, what I'm really asking is this.  Let's say you agree that there  
are use cases for accessing a header value as either the raw  
encoded bytes or the decoded unicode.  What should this return:


>>> message['Subject']

The raw bytes or the decoded unicode?


My personal preference would be to just get deprecate this API, and  
get rid of it, replacing it with a slightly more explicit one.


  message.headers['Subject']
  message.bytes_headers['Subject']


This is pretty darn clever Glyph.  Stop that! :)

I'm not 100% sure I like the name .bytes_headers or that .headers  
should be the decoded header (rather than have .headers return the  
bytes thingie and say .decoded_headers return the decoded thingies),  
but I do like the general approach.


Now, setting headers.  Sometimes you have some unicode thing and  
sometimes you have some bytes.  You need to end up with bytes in  
the ASCII range and you'd like to leave the header value unencoded  
if so. But in both cases, you might have bytes or characters  
outside that range, so you need an explicit encoding, defaulting to  
utf-8 probably.


  message.headers['Subject'] = 'Some text'

should be equivalent to

  message.headers['Subject'] = Header('Some text')


Yes, absolutely.  I think we're all in general agreement that header  
values should be instances of Header, or subclasses thereof.



My preference would be that

  message.headers['Subject'] = b'Some Bytes'

would simply raise an exception.  If you've got some bytes, you  
should instead do


  message.bytes_headers['Subject'] = b'Some Bytes'

or

  message.headers['Subject'] = Header(bytes=b'Some Bytes',  
encoding='utf-8')


Explicit is better than implicit, right?


Yes.

Again, I really like the general idea, if I might quibble about some  
of the details.  Thanks for a great suggestion.


-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Robert Brewer
On Thu, 2009-04-09 at 22:38 -0400, Barry Warsaw wrote:
> On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote:
> 
> > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw  wrote:
> > Anyway, aside from that decision, I haven't come up with an elegant  
> > way to allow /output/ in both bytes and strings (input is I think  
> > theoretically easier by sniffing the arguments).
> >
> > Won't this work? (assuming dumps() always returns a string)
> >
> > def dumpb(obj, encoding='utf-8', *args, **kw):
> > s = dumps(obj, *args, **kw)
> > return s.encode(encoding)
> 
> So, what I'm really asking is this.  Let's say you agree that there  
> are use cases for accessing a header value as either the raw encoded  
> bytes or the decoded unicode.  What should this return:
> 
>  >>> message['Subject']
> 
> The raw bytes or the decoded unicode?
> 
> Okay, so you've picked one.  Now how do you spell the other way?
> 
> The Message class probably has these explicit methods:
> 
>  >>> Message.get_header_bytes('Subject')
>  >>> Message.get_header_string('Subject')
> 
> (or better names... it's late and I'm tired ;).  One of those maps to  
> message['Subject'] but which is the more obvious choice?
> 
> Now, setting headers.  Sometimes you have some unicode thing and  
> sometimes you have some bytes.  You need to end up with bytes in the  
> ASCII range and you'd like to leave the header value unencoded if so.   
> But in both cases, you might have bytes or characters outside that  
> range, so you need an explicit encoding, defaulting to utf-8 probably.
> 
>  >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>  >>> Message.set_header('Subject', b'Some bytes')
> 
> One of those maps to
> 
>  >>> message['Subject'] = ???
> 
> I'm open to any suggestions here!

Syntactically, there's no sense in providing:

Message.set_header('Subject', 'Some text', encoding='utf-16')

...since you could more clearly write the same as:

Message.set_header('Subject', 'Some text'.encode('utf-16'))

The only interesting case is if you provided a *default* encoding, so that:

Message.default_header_encoding = 'utf-16'
Message.set_header('Subject', 'Some text')

...has the same effect.

But it would be far easier to do all the encoding at once in an output()
or serialize() method. Do different headers need different encodings? If
so, make message['Subject'] a subclass of str and give it an .encoding
attribute (with a default). If not, Message.header_encoding should be
sufficient.


Robert Brewer
fuman...@aminus.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:


At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
...

So, what I'm really asking is this.  Let's say you agree that there
are use cases for accessing a header value as either the raw encoded
bytes or the decoded unicode.  What should this return:


message['Subject']


The raw bytes or the decoded unicode?


That's an easy one:  Subject: is an unstructured header, so it must be
text, thus Unicode.  We're looking at a high-level representation of  
an

email message, with parsed header fields and a MIME message tree.


I'm liking Glyph's suggestion here.  We'll probably have to support  
the message['Subject'] API for backward compatibility, but in that  
case it really should be a bytes API.



(or better names... it's late and I'm tired ;).  One of those maps to
message['Subject'] but which is the more obvious choice?


Structured header fields are more of a problem.  Any header with  
addresses
should return a list of addresses.  I think the default return type  
should
depend on the data type.  To get an explicit bytes or string or list  
of
addresses, be explicit; otherwise, for convenience, return the  
appropriate

type for the particular header field name.


Yes, structured headers are trickier.  In a separate message, James  
Knight makes some excellent points, which I agree with.  However the  
email package obviously cannot support every time of structured header  
possible.  It must support this through extensibility.


The obvious way is through inheritance (i.e. subclasses of Header),  
but in my experience, using inheritance of the Message class really  
doesn't work very well.  You need to pass around factories to parsing  
functions and your application tends to have its own hierarchy of  
subclasses for whatever extra things it needs.  ISTM that subclassing  
is simply not the right pattern to support extensibility in the  
Message objects or Header objects.  Yes, this leads me to think that  
all the MIME* subclasses are essentially /wrong/.


Having said all that, the email package must support structured  
headers.  Look at the insanity which is the current folding whitespace  
splitting and the impossibility of the current code to do the right  
thing for say Subject headers and Received headers, and you begin to  
see why it must be possible to extend this stuff.



Now, setting headers.  Sometimes you have some unicode thing and
sometimes you have some bytes.  You need to end up with bytes in the
ASCII range and you'd like to leave the header value unencoded if so.
But in both cases, you might have bytes or characters outside that
range, so you need an explicit encoding, defaulting to utf-8  
probably.


Never for header fields.  The default is always RFC 2047, unless it  
isn't,

say for params.

The Message class should create an object of the appropriate  
subclass of

Header based on the name (or use the existing object, see other
discussion), and that should inspect its argument and DTRT or  
complain.



Message.set_header('Subject', 'Some text', encoding='utf-8')
Message.set_header('Subject', b'Some bytes')


One of those maps to


message['Subject'] = ???


The expected data type should depend on the header field.  For  
Subject:, it
should be bytes to be parsed or verbatim text.  For To:, it should  
be a

list of addresses or bytes or text to be parsed.


At a higher level, yes.  At the low level, it has to be bytes.

The email package should be pythonic, and not require deep  
understanding of
dozens of RFCs to use properly.  Users don't need to know about the  
raw
bytes; that's the whole point of MIME and any email package.  It  
should be
easy to set header fields with their natural data types, and doing  
it with
bad data should produce an error.  This may require a bit more care  
in the

message parser, to always produce a parsed message with defects.


I agree that we should have some higher level APIs that make it easy  
to compose email messages, and probably easy-ish to parse a byte  
stream into an email message tree.  But we can't build those without  
the lower level raw support.  I'm also convinced that this lower level  
will be the domain of those crazy enough to have the RFCs tattooed to  
the back of their eyelids.


-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 9, 2009, at 11:59 PM, Tony Nelson wrote:

Thinking about this stuff makes me nostalgic for the sloppy happy  
days

of Python 2.x


You now have the opportunity to finally unsnarl that mess.  It is  
not an

insurmountable opportunity.


No, it's just a full time job .  Now where did I put that hack- 
drink-coffee-twitter clone?


-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 10, 2009, at 1:22 AM, Stephen J. Turnbull wrote:


Those objects have headers and payload.  The payload can be of any
type, though I think it generally breaks down into "strings" for  
text/

* types and bytes for anything else (not counting multiparts).


*sigh*  Why are you back-tracking?


I'm not.  Sleep deprivation on makes it seem like that.


The payload should be of an appropriate *object* type.  Atomic object
types will have their content stored as string or bytes [nb I use
Python 3 terminology throughout].  Composite types (multipart/*) won't
need string or bytes attributes AFAICS.


Yes, agreed.


Start by implementing the application/octet-stream and
text/plain;charset=utf-8 object types, of course.


Yes.  See my lament about using inheritance for this.

It does seem to make sense to think about headers as text header  
names

and text header values.


I disagree.  IMHO, structured header types should have object values,
and something like


While I agree, there's still a need for a higher level API that make  
it easy to do the simple things.



message['to'] = "Barry 'da FLUFL' Warsaw "

should be smart enough to detect that it's a string and attempt to
(flexibly) parse it into a fullname and a mailbox adding escapes, etc.
Whether these should be structured objects or they can be strings or
bytes, I'm not sure (probably bytes, not strings, though -- see next
exampl).  OTOH

message['to'] = b'''"Barry 'da.FLUFL' Warsaw" '''

should assume that the client knows what they are doing, and should
parse it strictly (and I mean "be a real bastard", eg, raise an
exception on any non-ASCII octet), merely dividing it into fullname
and mailbox, and caching the bytes for later insertion in a
wire-format message.


I agree that the Message class needs to be strict.  A parser needs to  
be lenient; see the .defects attribute introduced in the current email  
package.  Oh, and this reminds me that we still haven't talked about  
idempotency.  That's an important principle in the current email  
package, but do we need to give up on that?



In that case, I think you want the values as unicodes, and probably
the headers as unicodes containing only ASCII.  So your table would  
be

strings in both cases.  OTOH, maybe your application cares about the
raw underlying encoded data, in which case the header names are
probably still strings of ASCII-ish unicodes and the values are
bytes.  It's this distinction (and I think the competing use cases)
that make a true Python 3.x API for email more complicated.


I don't see why you can't have the email API be specific, with
message['to'] always returning a structured_header object (or maybe
even more specifically an address_header object), and methods like

message['to'].build_header_as_text()

which returns

"""To: "Barry 'da.FLUFL' Warsaw" """

and

message['to'].build_header_in_wire_format()

which returns

b"""To: "Barry 'da.FLUFL' Warsaw" """

Then have email.textview.Message and email.wireview.Message which
provide a simple interface where message['to'] would invoke
.build_header_as_text() and .build_header_in_wire_format()
respectively.


This seems similar to Glyph's basic idea, but with a different spelling.

Thinking about this stuff makes me nostalgic for the sloppy happy  
days

of Python 2.x


Er, yeah.

Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly  
y'rs,


Can I have my uucp address back now?
-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Glenn Linderman
On approximately 4/10/2009 9:56 AM, came the following characters from 
the keyboard of Barry Warsaw:

On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote:

On 02:38 am, ba...@python.org wrote:
So, what I'm really asking is this.  Let's say you agree that there 
are use cases for accessing a header value as either the raw encoded 
bytes or the decoded unicode.  What should this return:


>>> message['Subject']

The raw bytes or the decoded unicode?


My personal preference would be to just get deprecate this API, and 
get rid of it, replacing it with a slightly more explicit one.


  message.headers['Subject']
  message.bytes_headers['Subject']


This is pretty darn clever Glyph.  Stop that! :)

I'm not 100% sure I like the name .bytes_headers or that .headers 
should be the decoded header (rather than have .headers return the 
bytes thingie and say .decoded_headers return the decoded thingies), 
but I do like the general approach.


If one name has to be longer than the other, it should be the bytes 
version.  Real user code is more likely to want to use the text version, 
and hopefully there will be more of that type of code than 
implementations using bytes.


Of course, one could use message.header and message.bythdr and they'd be 
the same length.



--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Michael Foord

Glenn Linderman wrote:
On approximately 4/10/2009 9:56 AM, came the following characters from 
the keyboard of Barry Warsaw:

On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote:

On 02:38 am, ba...@python.org wrote:
So, what I'm really asking is this.  Let's say you agree that there 
are use cases for accessing a header value as either the raw 
encoded bytes or the decoded unicode.  What should this return:


>>> message['Subject']

The raw bytes or the decoded unicode?


My personal preference would be to just get deprecate this API, and 
get rid of it, replacing it with a slightly more explicit one.


  message.headers['Subject']
  message.bytes_headers['Subject']


This is pretty darn clever Glyph.  Stop that! :)

I'm not 100% sure I like the name .bytes_headers or that .headers 
should be the decoded header (rather than have .headers return the 
bytes thingie and say .decoded_headers return the decoded thingies), 
but I do like the general approach.


If one name has to be longer than the other, it should be the bytes 
version.  Real user code is more likely to want to use the text 
version, and hopefully there will be more of that type of code than 
implementations using bytes.


Of course, one could use message.header and message.bythdr and they'd 
be the same length.




Shouldn't headers always be text?

Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:

 > > (3) The default transfer encoding syntax is UTF-8.
 > 
 > Notice that the RFC is partially irrelevant. It only applies
 > to the application/json mime type, and JSON is used in various
 > other protocols, using various other encodings.

Sure.  That's their problem.  In Python, Unicode is the native
encoding, and we have codecs to deal with the outside world, no?  That
happens to match very well not only with RFC 4627, but the sidebar on
json.org that defines JSON.

 > > I think it's a bad idea for any of the core JSON API to accept or
 > > produce bytes in any language that provides a Unicode string type.
 > 
 > So how do you integrate the encoding detection that the RFC suggests
 > to be done?

I suggest you don't.  That's mission creep.  Think about writing tests
for it, and remember that out in the wild those "various other
encodings" almost certainly include Shift JIS, Big5, and KOI8-R.  Both
those considerations point to "er, let's delegate detection and
en/decoding to the nice folks who maintain the codec suite."  Where
it's embedded in some other protocol which specifies a TES, the TES
can be implemented there, too.

As I wrote earlier, I don't see anything wrong with providing a
wrapper module that deals with some default/common/easy cases.  But
I'd stick it in the contrib directory.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote:

If one name has to be longer than the other, it should be the bytes  
version.  Real user code is more likely to want to use the text  
version, and hopefully there will be more of that type of code than  
implementations using bytes.


I'm not sure we know that yet, actually.  Nothing written for Python 2  
counts, and email is too broken in 3 for any sane person to be writing  
such code for Python 3.


Of course, one could use message.header and message.bythdr and  
they'd be the same length.


I was trying to figure out what  a 'thdr' was that we'd want to index  
'by' it. :)


-Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Barry Warsaw

On Apr 10, 2009, at 2:06 PM, Michael Foord wrote:


Shouldn't headers always be text?


/me weeps



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-10 Thread Stephen J. Turnbull
Shouldn't this thread move lock stock and .signature to email-sig?

Barry Warsaw writes:

 > >> It does seem to make sense to think about headers as text header
 > >> names and text header values.
 > >
 > > I disagree.  IMHO, structured header types should have object values,
 > > and something like
 > 
 > While I agree, there's still a need for a higher level API that make  
 > it easy to do the simple things.

Sure.  I'm suggesting that the way to determine whether something is
simple or not is by whether it falls out naturally from correct
structure.  Ie, no operations that only a Cirque du Soleil juggler can
perform are allowed.

 > I agree that the Message class needs to be strict.  A parser needs to  
 > be lenient;

Not always.  The Postel Principle only applies to stuph coming in off
the wire.  But we're *also* going to be parsing pseudo-email
components that are being handed to us by applications (eg, the
perennial control-character-in-the-unremovable-address Mailman bug).
Our parser should Just Say No to that crap.

 > see the .defects attribute introduced in the current email  
 > package.  Oh, and this reminds me that we still haven't talked about  
 > idempotency.  That's an important principle in the current email  
 > package, but do we need to give up on that?

"Idempotency"?  I'm not sure what that means in the context of the
email package ... multiplication by zero?  Do you mean that
.parse().to_wire() should be idempotent?  Yes, I think that's a good
idea, and it shouldn't be too hard to implement by (optionally?)
caching the whole original message or individual components (headers
with all whitespace including folding cached verbatim, etc).  I think
caching has to be done, since stuff like "did the original fold with a
leading tab or a leading space, and at what column" and so on seems
kind of pointless to encode as attributes on Header objects.

[Description of MessageTextView and MessageWireView elided.]

 > This seems similar to Glyph's basic idea, but with a different spelling.

Yes.  I don't much care which way it's done, and Glyph's style of
spelling is more explicit.  But I was thinking in terms of the number
of people who are surely going to sing "Mama don' 'low no Unicodes
roun' here" and squeal "codec WTF?! outta mah face, man!"
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

2009-04-10 Thread Stephen J. Turnbull
Bill Janssen writes:
 > Barry Warsaw  wrote:
 > 
 > > In that case, we really need the
 > > bytes-in-bytes-out-bytes-in-the-chewy-
 > > center API first, and build things on top of that.
 > 
 > Yep.

Uh, I hate to rain on a parade, but isn't that how we arrived at the
*current* email package?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread P.J. Eby

At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote:

This problem (slow application startup times due to too many imports at
startup, which can in turn can be due to top level imports for library
or framework functionality that a given application doesn't actually
use) is actually the main reason I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.


Have you tried http://pypi.python.org/pypi/Importing ? Or more 
specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ?


It does of course use the import lock, but as long as your top-level 
module code doesn't acquire locks (directly or indirectly), it 
shouldn't be possible to deadlock.  (Or more precisely, to add any 
*new* deadlocks that you didn't already have.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

2009-04-10 Thread Barry Warsaw

On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote:


Bill Janssen writes:

Barry Warsaw  wrote:


In that case, we really need the
bytes-in-bytes-out-bytes-in-the-chewy-
center API first, and build things on top of that.


Yep.


Uh, I hate to rain on a parade, but isn't that how we arrived at the
*current* email package?


Not really.  We got here because we were too damn sloppy  
about the distinction.


I'm going to remove python-dev from subsequent follow ups.  Please  
join us at email-sig for further discussion.


Barry



PGP.sig
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Aahz
On Fri, Apr 10, 2009, Barry Warsaw wrote:
> On Apr 10, 2009, at 2:06 PM, Michael Foord wrote:
>>
>> Shouldn't headers always be text?
>
> /me weeps

/me hands Barry a hankie
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Stephen J. Turnbull
Robert Brewer writes:

 > Syntactically, there's no sense in providing:
 > 
 > Message.set_header('Subject', 'Some text', encoding='utf-16')
 > 
 > ...since you could more clearly write the same as:
 > 
 > Message.set_header('Subject', 'Some text'.encode('utf-16'))

Which you now must *parse* and guess the encoding to determine how to
RFC-2047-encode the binary mush.  I think the encoding parameter is
necessary here.

 > But it would be far easier to do all the encoding at once in an
 > output() or serialize() method. Do different headers need different
 > encodings?

You can have multiple encodings within a single header (and a naïve
algorithm might very well encode "The price of Gödel-Escher-Bach is
€25" as "The price of =?ISO-8859-1?Q?G=F6del-Escher-Bach?= is
=?ISO-8859-15?Q?=A425?=").

 > If so, make message['Subject'] a subclass of str and give it an
 > .encoding attribute (with a default).

But if you've set the .encoding attribute, you don't need to encode
'Some text'; .set_header() can take care of it for you.  And what
about the possibility that the encoding attributes disagree with the
argument you passed to the codec?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread C. Titus Brown
Hi all,

this year we have 10-12 GSoC applications that I've put in the "relevant
to core Python development" category.  These projects, if mentors etc
are found, are *guaranteed* a slot under the PSF GSoC umbrella.  As
backup GSoC admin and general busybody, I've taken on the work of
coordinating these as a special subgroup within the PSF GSoC, and I
thought it would be good to mention them to python-dev.

Note that all of them have been run by a few different committers,
including Martin, Tarek, Benjamin, and Brett, and they've been obliging
enough to triage a few of them.  Thanks, guys!

Here's what's left after that triage.  Note that except for the four at
the top, these have all received positive support from *someone* who is
a committer and I don't think we need to discuss them here -- patches
etc. can go through normal "python-dev" channels during the course of the
summer.

I am looking for feedback on the first four, though.  Can these
reasonably be considered "core" priorites for Python?  Remember, this
"costs" us something in the sense of preferring these over Python
subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim,
etc.

---

Questionable "core":

2x "port NumPy to py3k" -- NumPy is a major Python module and porting it
to py3k fits with Guido's request that "more stuff get ported".
To be clear, I don't think anyone expects all of NumPy to get
ported this summer, but these students will work through issues
associated with porting big chunks o' code to py3k.

One medium/strong proposal, one medium/weak proposal.

Comments/thoughts?

2x "improve testing tools for py3k" -- variously focus on improving test
coverage and testing wrappers.

One proposes to provide a nice wrapper to make nose and py.test
capable of running the regrtests, which (with no change to
regrtest) would let people run tests in parallel, distribute or
run tests across multiple machines (including Snakebite), tag
and run subsets of tests with personal and/or public tags, and
otherwise take advantage of many of the nice features of nose
and py.test.

The other proposes to measure & increase the code coverage of
the py3k tests in both Python and C, integrate across multiple
machines, and otherwise provide a nice set of integrated reports
that anyone can generate on their own machines.  This proposal,
in particular, could move smoothly towards the effort to produce
a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
(This wasn't integrated into the proposal because I only found
out about it after the proposals were due.)

I personally think that both testing proposals are good, and
they grew out of conversations I had with Brett, who thinks that
the general ideas are good.  So, err, I'm looking for pushback,
I guess ;).  I can expand on these ideas a bit if people are
interested.

Both proposals are medium at least, and I've personally been
positively impressed with the student interaction.

Comments/thoughts?

---

Unquestionably "core" by my criteria above:

3to2 tool -- 'nuff said.

subprocess improvement -- integrating, testing, and proposing some of
the various subprocess improvements that have passed across this
list & the bug tracker

IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
issues relating to IDLE and Tkinter.

roundup VCS integration / build tools to support core development --
a single student proposed both of these and has received some
support.  See http://slexy.org/view/s2pFgWxufI for details.

sphinx framework improvement -- support for per-paragraph comments and
user/developer interface for submitting/committing fixes 

2x "keyring package" -- see
http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/.
The poorer one of these will probably be axed unless Tarek gives it
strong support.

--

--titus
-- 
C. Titus Brown, c...@msu.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Guilherme Polo
On Fri, Apr 10, 2009 at 5:38 PM, C. Titus Brown  wrote:
> Hi all,
>
> this year we have 10-12 GSoC applications that I've put in the "relevant
> to core Python development" category.  These projects, if mentors etc
> are found, are *guaranteed* a slot under the PSF GSoC umbrella.  As
> backup GSoC admin and general busybody, I've taken on the work of
> coordinating these as a special subgroup within the PSF GSoC, and I
> thought it would be good to mention them to python-dev.
>
> Note that all of them have been run by a few different committers,
> including Martin, Tarek, Benjamin, and Brett, and they've been obliging
> enough to triage a few of them.  Thanks, guys!
>
> Here's what's left after that triage.
> .
> .
>
> IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
>        issues relating to IDLE and Tkinter.
>

Is it important, for the discussion, to mention that it also involves
testing this area (idle and tkinter), Titus ? I'm considering this
more important than "just" dealing with the tracker issues.

> --titus
> --
> C. Titus Brown, c...@msu.edu

Regards,

-- 
-- Guilherme H. Polo Goncalves
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread C. Titus Brown
On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote:
-> >
-> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
-> > ? ? ? ?issues relating to IDLE and Tkinter.
-> >
-> 
-> Is it important, for the discussion, to mention that it also involves
-> testing this area (idle and tkinter), Titus ? I'm considering this
-> more important than "just" dealing with the tracker issues.

What, I tell you that your app is going to be accepted and we shouldn't
argue about it, and you want to argue about it? ;)

--titus
-- 
C. Titus Brown, c...@msu.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Terry Reedy

gl...@divmod.com wrote:


On 03:21 am, ncogh...@gmail.com wrote:

Barry Warsaw wrote:



I don't know whether the parameter thing will work or not, but you're
probably right that we need to get the bytes-everywhere API first.



Given that json is a wire protocol, that sounds like the right approach
for json as well. Once bytes-everywhere works, then a text API can be
built on top of it, but it is difficult to build a bytes API on top of a
text one.


I wish I could agree, but JSON isn't really a wire protocol.  According 
to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the 
serialization of structured data".  There are some notes about encoding, 
but it is very clearly described in terms of unicode code points.

So I guess the IO library *is* the right model: bytes at the bottom of
the stack, with text as a wrapper around it (mediated by codecs).


In email's case this is true, but in JSON's case it's not.  JSON is a 
format defined as a sequence of code points; MIME is defined as a 
sequence of octets.


What is the 'bytes support' issue for json?  Is it about content within 
a json text? Or about the transport format of a json text?


Reading rfc4627, a json text is a unicode string representation of an 
instance of one of 6 classes.  In Python terms, they are Nonetype, bool, 
numbers (int, float, decimal?), (unicode) str, list, and [string-keyed] 
dict.  The representation is nearly identical to Python's literals and 
displays.


For transport,  the encoding SHALL be one of UTF-8, -16LE/BE, -32LE/BD, 
with UFT-8 the 'default'.


So a json parser (a restricted eval()) tokenizes and parses a stream of 
unicode chars which in Python could come from either a unicode string or 
decoded bytes object.  The bytes decoding could be either bulk or 
incremental.


Similarly, a json generator (an repr()-like function) produces a stream 
of unicode chars which again could be optionally encoded to bytes, 
either incrementally or in bulk.


The standard does not specify any correspondence between representations 
and domain objects,  For Python making 'null', 'true', and 'false' 
inter-convert with None, True, False is obvious.  Numbers are slightly 
more problemmtical.  A generator could produce decimal literals from 
both floats and decimals but without a non-json extension, a parser 
could only convert back to one, so the other would not round-trip. (Int 
could be handled by the presence or absence of '.0'.)  Similarly, tuples 
could be represented, like lists, as json square-bracketed arrays, but 
they would be converted back to lists, not tuples, unless a non-json 
extension were used.


So the two possible byte-suppost content issues I see are how to 
represent them as legal json strings and/or whether some device should 
be added to make them round-trip.  But as indicated above, these two 
issues are not unique to bytes.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Tennessee Leeuwenburg
Well, I think Numpy is of huge importance to a major Python user segment,
the scientific community. I don't know if that makes it 'core', but I
strongly agree that it's important.
Better testing is always useful, and more "core", but IMO less important.

-T

On Sat, Apr 11, 2009 at 6:38 AM, C. Titus Brown  wrote:

> Hi all,
>
> this year we have 10-12 GSoC applications that I've put in the "relevant
> to core Python development" category.  These projects, if mentors etc
> are found, are *guaranteed* a slot under the PSF GSoC umbrella.  As
> backup GSoC admin and general busybody, I've taken on the work of
> coordinating these as a special subgroup within the PSF GSoC, and I
> thought it would be good to mention them to python-dev.
>
> Note that all of them have been run by a few different committers,
> including Martin, Tarek, Benjamin, and Brett, and they've been obliging
> enough to triage a few of them.  Thanks, guys!
>
> Here's what's left after that triage.  Note that except for the four at
> the top, these have all received positive support from *someone* who is
> a committer and I don't think we need to discuss them here -- patches
> etc. can go through normal "python-dev" channels during the course of the
> summer.
>
> I am looking for feedback on the first four, though.  Can these
> reasonably be considered "core" priorites for Python?  Remember, this
> "costs" us something in the sense of preferring these over Python
> subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim,
> etc.
>
> ---
>
> Questionable "core":
>
> 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it
>to py3k fits with Guido's request that "more stuff get ported".
>To be clear, I don't think anyone expects all of NumPy to get
>ported this summer, but these students will work through issues
>associated with porting big chunks o' code to py3k.
>
>One medium/strong proposal, one medium/weak proposal.
>
> Comments/thoughts?
>
> 2x "improve testing tools for py3k" -- variously focus on improving test
>coverage and testing wrappers.
>
>One proposes to provide a nice wrapper to make nose and py.test
>capable of running the regrtests, which (with no change to
>regrtest) would let people run tests in parallel, distribute or
>run tests across multiple machines (including Snakebite), tag
>and run subsets of tests with personal and/or public tags, and
>otherwise take advantage of many of the nice features of nose
>and py.test.
>
>The other proposes to measure & increase the code coverage of
>the py3k tests in both Python and C, integrate across multiple
>machines, and otherwise provide a nice set of integrated reports
>that anyone can generate on their own machines.  This proposal,
>in particular, could move smoothly towards the effort to produce
>a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
>(This wasn't integrated into the proposal because I only found
>out about it after the proposals were due.)
>
>I personally think that both testing proposals are good, and
>they grew out of conversations I had with Brett, who thinks that
>the general ideas are good.  So, err, I'm looking for pushback,
>I guess ;).  I can expand on these ideas a bit if people are
>interested.
>
>Both proposals are medium at least, and I've personally been
>positively impressed with the student interaction.
>
> Comments/thoughts?
>
> ---
>
> Unquestionably "core" by my criteria above:
>
> 3to2 tool -- 'nuff said.
>
> subprocess improvement -- integrating, testing, and proposing some of
>the various subprocess improvements that have passed across this
>list & the bug tracker
>
> IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
>issues relating to IDLE and Tkinter.
>
> roundup VCS integration / build tools to support core development --
>a single student proposed both of these and has received some
>support.  See http://slexy.org/view/s2pFgWxufI for details.
>
> sphinx framework improvement -- support for per-paragraph comments and
>user/developer interface for submitting/committing fixes
>
> 2x "keyring package" -- see
>
> http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/
> .
> The poorer one of these will probably be axed unless Tarek gives it
> strong support.
>
> --
>
> --titus
> --
> C. Titus Brown, c...@msu.edu
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com
>



-- 
--
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe ever

Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Guilherme Polo
On Fri, Apr 10, 2009 at 6:02 PM, C. Titus Brown  wrote:
> On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote:
> -> >
> -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
> -> > ? ? ? ?issues relating to IDLE and Tkinter.
> -> >
> ->
> -> Is it important, for the discussion, to mention that it also involves
> -> testing this area (idle and tkinter), Titus ? I'm considering this
> -> more important than "just" dealing with the tracker issues.
>
> What, I tell you that your app is going to be accepted and we shouldn't
> argue about it, and you want to argue about it? ;)
>

Oh awesome then :) I think I misread part of your original email.

> --titus
> --
> C. Titus Brown, c...@msu.edu
>



-- 
-- Guilherme H. Polo Goncalves
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Benjamin Peterson
2009/4/10 C. Titus Brown :
> 2x "improve testing tools for py3k" -- variously focus on improving test
>        coverage and testing wrappers.
>
>        One proposes to provide a nice wrapper to make nose and py.test
>        capable of running the regrtests, which (with no change to
>        regrtest) would let people run tests in parallel, distribute or
>        run tests across multiple machines (including Snakebite), tag
>        and run subsets of tests with personal and/or public tags, and
>        otherwise take advantage of many of the nice features of nose
>        and py.test.
>
>        The other proposes to measure & increase the code coverage of
>        the py3k tests in both Python and C, integrate across multiple
>        machines, and otherwise provide a nice set of integrated reports
>        that anyone can generate on their own machines.  This proposal,
>        in particular, could move smoothly towards the effort to produce
>        a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
>        (This wasn't integrated into the proposal because I only found
>        out about it after the proposals were due.)
>
>        I personally think that both testing proposals are good, and
>        they grew out of conversations I had with Brett, who thinks that
>        the general ideas are good.  So, err, I'm looking for pushback,
>        I guess ;).  I can expand on these ideas a bit if people are
>        interested.
>
>        Both proposals are medium at least, and I've personally been
>        positively impressed with the student interaction.


To me, both of those proposals seem to say "measure and improve test
coverage" or "nose integration" with a severe lack specific details.
Especially the nose plugin one seems like very little work. (Running
default nose in the test directory in fact works fairly well.)

Another small nit is that they should address Python 2.x, too.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread C. Titus Brown
On Fri, Apr 10, 2009 at 06:05:02PM -0500, Benjamin Peterson wrote:
-> 2009/4/10 C. Titus Brown :
-> > 2x "improve testing tools for py3k" -- variously focus on improving test
-> > ?? ?? ?? ??coverage and testing wrappers.
-> >
-> > ?? ?? ?? ??One proposes to provide a nice wrapper to make nose and py.test
-> > ?? ?? ?? ??capable of running the regrtests, which (with no change to
-> > ?? ?? ?? ??regrtest) would let people run tests in parallel, distribute or
-> > ?? ?? ?? ??run tests across multiple machines (including Snakebite), tag
-> > ?? ?? ?? ??and run subsets of tests with personal and/or public tags, and
-> > ?? ?? ?? ??otherwise take advantage of many of the nice features of nose
-> > ?? ?? ?? ??and py.test.
-> >
-> > ?? ?? ?? ??The other proposes to measure & increase the code coverage of
-> > ?? ?? ?? ??the py3k tests in both Python and C, integrate across multiple
-> > ?? ?? ?? ??machines, and otherwise provide a nice set of integrated reports
-> > ?? ?? ?? ??that anyone can generate on their own machines. ??This proposal,
-> > ?? ?? ?? ??in particular, could move smoothly towards the effort to produce
-> > ?? ?? ?? ??a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
-> > ?? ?? ?? ??(This wasn't integrated into the proposal because I only found
-> > ?? ?? ?? ??out about it after the proposals were due.)
-> >
-> > ?? ?? ?? ??I personally think that both testing proposals are good, and
-> > ?? ?? ?? ??they grew out of conversations I had with Brett, who thinks that
-> > ?? ?? ?? ??the general ideas are good. ??So, err, I'm looking for pushback,
-> > ?? ?? ?? ??I guess ;). ??I can expand on these ideas a bit if people are
-> > ?? ?? ?? ??interested.
-> >
-> > ?? ?? ?? ??Both proposals are medium at least, and I've personally been
-> > ?? ?? ?? ??positively impressed with the student interaction.
-> 
-> To me, both of those proposals seem to say "measure and improve test
-> coverage" or "nose integration" with a severe lack specific details.
-> Especially the nose plugin one seems like very little work. (Running
-> default nose in the test directory in fact works fairly well.)

...fairly, yes ;).  But not perfectly.  And certainly not with
equivalent guarantees to regrtest, which is really what Python
developers need.  Tracking down the corner cases, writing up examples,
setting up tags, getting multiprocess to work properly, and making sure
that coverage recording works properly, and then getting people to try
it out on THEIR machines, is likely to be a lot of work.

The plugin ecosystem for nose is growing daily and supporting that for
core would be fantastic; extending it to py.test (whose plugin interface
is now mostly compatible with nose) would be even better.

The lack of detail on the code coverage is intentional, IMO.  It's
non-trivial to get a full handle on C code coverage integrated with
Python code coverage -- or at least it has been for me -- so I supported
the student focusing on first writing robust coverage analysis tools,
and only then deciding what to "hit" with more tests.  I will encourage
the student to talk to this list (or the "tests" list in the stdlib sig)
in order to target areas that are more relevant to people.

I have had a hard time getting a good sense of what core code is well
tested and what is not well tested, across various platforms.  While
Walter's C/Python integrated code coverage site is nice, it would be
even nicer to have a way to generate all that information within any
particular checkout on a real-time basis.  Doing so in the context of
Snakebite would be icing... and I think it's worth supporting in core,
especially if it can be done without any changes *to* core.

-> Another small nit is that they should address Python 2.x, too.

I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an
equally valid criticism.  Certainly 3.x is the future so I though
focusing on increasing code coverage, and especially C code coverage,
could best be applied to 3.x.

cheers,
--titus
--
C. Titus Brown, c...@msu.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Jack diederich
On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown  wrote:
[megasnip]
> roundup VCS integration / build tools to support core development --
>        a single student proposed both of these and has received some
>        support.  See http://slexy.org/view/s2pFgWxufI for details.

>From the listed webpage I have no idea what he is promising (a
combination of very high level and very low level tasks).  If he is
offering all the same magic for Hg that Trac does for SVN (autolinking
"r2001" text to patches, for example) then I'm +1.  That should be
cake even for a student project.

He says vague things about patches too, but I'm not sure what.  If he
wanted to make that into a 'patchbot' that just applied every patch in
isolation and ran 'make && make test' and posted results in the
tracker I'd be a happy camper.

But maybe those are goals for next year, because I'm not quite sure
what the proposal is.

-Jack
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Lazy importing (was Rethinking intern() and its data structure)

2009-04-10 Thread Greg Ewing

Nick Coghlan wrote:


I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.


I created an ad-hoc one of these for PyGUI recently.
I can send you the code if you're interested.

I didn't have any problems with deadlocks, but I
did find one rather annoying problem. It seems that
an exception occurring at certain times during the
import process gets swallowed and turned into a
generic ImportError. I had to resort to catching
exceptions and printing my own traceback in order
to diagnose missing auto-imported names.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Greg Ewing

Paul Moore wrote:


3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

This is at best confused (in my utterly non-expert opinion :-)) as
Unicode isn't an encoding...


I'm inclined to agree. I'd go further and say that if JSON
is really mean to be a text format, the standard has no
business mentioning encodings at all.

The reason you use a text format in the first place is that
you have some way of transmitting text, and you want to
send something that isn't text. In that situation, the
encoding is already determined by whatever means you're
using to send the text.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Brendan Cully
On Friday, 10 April 2009 at 15:05, P.J. Eby wrote:
> At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote:
>> This problem (slow application startup times due to too many imports at
>> startup, which can in turn can be due to top level imports for library
>> or framework functionality that a given application doesn't actually
>> use) is actually the main reason I sometimes wish for a nice, solid lazy
>> module import mechanism that manages to avoid the potential deadlock
>> problems created by using import statements inside functions.

I'd love to see that too. I imagine it would be beneficial for many
python applications.

> Have you tried http://pypi.python.org/pypi/Importing ? Or more  
> specifically, 
> http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ?

Here's what we do in Mercurial, which is a little more user-friendly,
but possibly too magical for general use (but provides us a very nice
speedup):

http://www.selenic.com/repo/index.cgi/hg/file/tip/mercurial/demandimport.py#l1

It's nice and small, and it is invisible to the rest of the code, but
it's probably too aggressive for all users. The biggest problem is
probably that ImportErrors are deferred until first access, which
trips up modules that do things like

try:
  import foo
except ImportError
  import fallback as foo

of which there are a few. The mercurial module maintains a blacklist
as a bandaid, but it'd be great to have a real fix.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

2009-04-10 Thread Guido van Rossum
On Fri, Apr 10, 2009 at 12:04 PM, Barry Warsaw  wrote:
> On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote:
>
>> Bill Janssen writes:
>>>
>>> Barry Warsaw  wrote:
>>>
 In that case, we really need the
 bytes-in-bytes-out-bytes-in-the-chewy-
 center API first, and build things on top of that.
>>>
>>> Yep.
>>
>> Uh, I hate to rain on a parade, but isn't that how we arrived at the
>> *current* email package?
>
> Not really.  We got here because we were too damn sloppy about
> the distinction.

Agreed. I take full responsibility -- the str/unicode approach we
introduced in 2.0 seemed like the best thing we could do at the time,
but in retrospect it would've been better if we'd left str alone and
introduced a unicode type that was truly distinct -- like str in 3.0.
The email package is not the only system that ended up with a muddled
distinction between the two as a result.

> I'm going to remove python-dev from subsequent follow ups.  Please join us
> at email-sig for further discussion.
>
>Barry

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Going off-line for a week

2009-04-10 Thread Guido van Rossum
Folks, I'm going off-line for a week to enjoy a family vacation. When
I come back I'll probably just archive most email unread, so now's
your chance to add braces to the language. :-)

Not-yet-retiring-ly y'rs,

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Martin v. Löwis
>> In email's case this is true, but in JSON's case it's not.  JSON is a
>> format defined as a sequence of code points; MIME is defined as a
>> sequence of octets.
> 
> What is the 'bytes support' issue for json?  Is it about content within
> a json text? Or about the transport format of a json text?

The question is whether the json parsing should take bytes or str as
input, and whether the json marshalling should produce bytes or str.
More specifically, the question is whether it is ok to drop bytes.

I personally think that it needs to support bytes, and that perhaps
str support is optional (as you could always explicitly encode the
str as UTF-8 before passing it to the JSON parser, if you somehow
managed to get a str of JSON to parse).

However, I really think that this question cannot be answered by
reading the RFC. It should be answered by verifying how people use
the json library in 2.x.

> The standard does not specify any correspondence between representations
> and domain objects

And that is not the issue at all; nobody is debating what output the
parsing should produce.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Needing help to change the grammar

2009-04-10 Thread Harry (Thiago Leucz Astrizi)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello everybody. My name is Thiago and currently I'm working as a
teacher in a high school in Brazil. I have plans to offer in the
school a programming course to the students, but I had some problems
to find a good langüage. As a Python programmer, I really like the
language's syntax and I think that Python is very good to teach
programming. But there's a little problem: the commands and keywords
are in english and this can be an obstacle to the teenagers that could
enter in the course.

Because of this, I decided to create a Python version with keywords in
portuguese and with some modifications in the grammar to be more
portuguese-like. To this, I'm using Python 3.0.1 source code.

I already read PEP 306 (How to Change Python's Grammar) and changed
the suggested files. My changes currently are working properly except
for one thing: the "comp_op". The code that in english Python is
written as "is not", in portuguese Python shall be "não é". Besides
the translations to the words "is" and "not", I'm also changing the
order in which they appear letting "not" before "is".

It appears to be a simple change, but strangely, I'm not being able to
perform it. I already made correct modifications in Grammar/Grammar
file, the new keywords already appear in Lib/keyword.py and I also
changed the function validate_comp_op in Modules/parsermodule.c:

static int
validate_comp_op(node *tree)
{
(...)
else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) {
res = (validate_ntype(CHILD(tree, 0), NAME)
   && validate_ntype(CHILD(tree, 1), NAME)
   && (((strcmp(STR(CHILD(tree, 0)), "não") == 0)
&& (strcmp(STR(CHILD(tree, 1)), "é") == 0))
   || ((strcmp(STR(CHILD(tree, 0)), "não") == 0)
   && (strcmp(STR(CHILD(tree, 1)), "em") == 0;
if (!res && !PyErr_Occurred())
err_string("operador de comparação desconhecido");
}
return (res);
}

I also looked in the other files proposed in the PEP but I didn't find
in them nothing that I recognized as needing changes.

But when I type "make" to compile the new language, the following
error appears in Lib/encodings/__init__.py (which I already
translated to the portuguese Python):


ha...@skynet:~/Python-3.0.1$ make
Fatal Python error: Py_Initialize:
  can't initialize sys standard streams File
"/home/harry/Python-3.0.1/Lib/encodings/__init__.py", line 73
se entry não é _unknown: ^ SyntaxError: invalid syntax

The comp_op doesn't work! I don't know more what to change. Perhaps
there's some file that I should modify, but I didn't paid attention
enough in it... Please, anybody has some idea of what should I do?
Thanks a lot.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJ3/eTmNGEzq1zP84RAh5vAJ492eVFgbR5KCCJNdTJOIR/Xtfb0ACdE0NG
Yxnxmo9yjOL6H8J93nPBcJs=
=6VLu
-END PGP SIGNATURE-


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Mark Hammond

[Dropping email sig]

On 11/04/2009 1:06 PM, "Martin v. Löwis" wrote:


However, I really think that this question cannot be answered by
reading the RFC. It should be answered by verifying how people use
the json library in 2.x.


In the absence of anything more formal, here are 2 anecdotes:

* The python-twitter package seems to:
  - Use dumps() mainly to get string objects.  It uses it both for 
__str__, and for an API called 'AsJsonString' - the intent of this seems 
to be to provide strings for the consumer of the twitter API - its not 
clear how such consumers would use them.  Note that this API doesn't 
seem to need to 'write' json objects, else I suspect they would then be 
expecting dumps to return bytes to put on the wire.  They expect loads 
to accept the bytes they are reading directly off the wire.


* couchdb's wrappers use these functions purely as bytes - they are 
either decoding an application/json object from the bits they read, or 
they are encoding it to use directly in the body of a request (or even 
directly in the URL of the request!)


I find myself conflicted.  On one hand I believe the most common use of 
json will be to exchange data with something inherently byte-based.  On 
the other hand though, json itself seems to be naturally "stringy" and 
the most natural interface for a casual user would be strings.


I'm personally leaning slightly towards strings, putting the burden on 
bytes-users of json to explicitly use the appropriate encoding, even in 
cases where it *must* be utf8.  On the other hand, I'm too lazy to dig 
back through this large thread, but I seem to recall a suggestion that 
using bytes would be significantly faster.  If that is true, I'd be 
happy to settle for bytes as I believe the most common *actual* use of 
json will be via things like the twitter and couch libraries - and may 
even be a key bottleneck for such libraries - so people will not be 
directly exposed to its interface...


Mark

Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Needing help to change the grammar

2009-04-10 Thread Martin v. Löwis
> It appears to be a simple change, but strangely, I'm not being able to
> perform it. I already made correct modifications in Grammar/Grammar
> file, the new keywords already appear in Lib/keyword.py and I also
> changed the function validate_comp_op in Modules/parsermodule.c:
> 
> static int
> validate_comp_op(node *tree)
> {
> (...)
> else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) {
> res = (validate_ntype(CHILD(tree, 0), NAME)
>&& validate_ntype(CHILD(tree, 1), NAME)
>&& (((strcmp(STR(CHILD(tree, 0)), "não") == 0)
> && (strcmp(STR(CHILD(tree, 1)), "é") == 0))
>|| ((strcmp(STR(CHILD(tree, 0)), "não") == 0)
>&& (strcmp(STR(CHILD(tree, 1)), "em") == 0;
> if (!res && !PyErr_Occurred())
> err_string("operador de comparação desconhecido");
> }
> return (res);
> }
> 

Notice that Python source is represented in UTF-8 in the parser.
It might be that the C source code has a different encoding, which
would cause the strcmp to fail.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-10 Thread Martin v. Löwis
> I'm personally leaning slightly towards strings, putting the burden on
> bytes-users of json to explicitly use the appropriate encoding, even in
> cases where it *must* be utf8.  On the other hand, I'm too lazy to dig
> back through this large thread, but I seem to recall a suggestion that
> using bytes would be significantly faster. 

Not sure whether it would be *significantly* faster, but yes, Bob wrote
an accelerator for parsing out of a byte string to make it really fast;
IIRC, he claims that it is faster than pickling.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Google Summer of Code/core Python projects - RFC

2009-04-10 Thread Martin v. Löwis
> 2x "keyring package" -- see
> http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/.
> The poorer one of these will probably be axed unless Tarek gives it
> strong support.

I don't think these are good "core" projects. Even if the students come
up with a complete solution, it shouldn't be integrated with the
standard library right away. Instead, it should have a life outside the
standard library, and be considered for inclusion only if the user
community wants it.

I'm also skeptical that this is a good SoC project in the first place.
Coming up with a wrapper for, say, Apple Keychain, could be a good
project. Coming up with a unifying API for all keychains is out of
scope, IMO; various past attempts at unifying APIs have demonstrated
that creating them is difficult, and might require writing a PEP
(whose acceptance then might not happen within a summer).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Needing help to change the grammar

2009-04-10 Thread Jack diederich
On Fri, Apr 10, 2009 at 9:58 PM, Harry (Thiago Leucz Astrizi)
 wrote:
>
> Hello everybody. My name is Thiago and currently I'm working as a
> teacher in a high school in Brazil. I have plans to offer in the
> school a programming course to the students, but I had some problems
> to find a good langüage. As a Python programmer, I really like the
> language's syntax and I think that Python is very good to teach
> programming. But there's a little problem: the commands and keywords
> are in english and this can be an obstacle to the teenagers that could
> enter in the course.
>
> Because of this, I decided to create a Python version with keywords in
> portuguese and with some modifications in the grammar to be more
> portuguese-like. To this, I'm using Python 3.0.1 source code.

I love the idea (and most recently edited PEP 306) so here are a few
suggestions;

Brazil has many python programmers so you might be able to make quick
progress by asking them for volunteer time.

To bug-hunt your technical problem: try switching the "not is"
operator to include an underscore "not_is."  The python LL(1) grammar
checker works for python but isn't robust, and does miss some grammar
ambiguities.  Making the operator a single word might reveal a bug in
the parser.

Please consider switching your students to 'real' python part way
through the course.  If they want to use the vast amount of python
code on the internet as examples they will need to know the few
English keywords.

Also - most python core developers are not native English speakers and
do OK :)  PyCon speakers are about 25% non-native English speakers and
EuroPython speakers are about the reverse (my rough estimate - I'd
love to see some hard numbers).

Keep up the Good Work,

-Jack
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com