Re: [Python-Dev] BLOBs in Pg
On 09 avril 14:05, Steve Holden wrote: > Oleg Broytmann wrote: > > On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: > >> I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that > >> PostgreSQL has no real support for BLOBs. > > > >I think it has - BYTEA data type. > > > But the Python DB adapters appears to require some fairly hairy escaping > of the data to make it usable with the cursor execute() method. IMHO you > shouldn't have to escape data that is passed for insertion via a > parameterized query. can't you simply use dbmodule.Binary to do the job? -- Sylvain Thénault LOGILAB, Paris (France) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework:http://www.cubicweb.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
gl...@divmod.com wrote: > On 03:21 am, ncogh...@gmail.com wrote: >> Given that json is a wire protocol, that sounds like the right approach >> for json as well. Once bytes-everywhere works, then a text API can be >> built on top of it, but it is difficult to build a bytes API on top of a >> text one. > > I wish I could agree, but JSON isn't really a wire protocol. According > to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the > serialization of structured data". There are some notes about encoding, > but it is very clearly described in terms of unicode code points. Ah, my apologies - if the RFC defines things such that the native format is Unicode, then yes, the appropriate Python 3.x data type for the base implementation would indeed be strings. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? IIRC every time I was asked > to reduce the start-up cost of some Python app, the cause was too many > imports, and the solution was either to speed up import itself (.pyc > files were the first thing ever that came out of that -- importing > from a single .zip file is one of the more recent tricks) or to reduce > the number of modules imported at start-up (or both :-). Heavy-weight > frameworks are usually the root cause, but usually there's nothing > that can be done about that by the time you've reached this point. So, > amen on the good luck, but please start with a bit of analysis. This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. Providing a clean API and implementation for that functionality is a pretty tough nut to crack though, so I'm not holding my breath... Cheers, Nick. P.S. It's only an occasional fairly idle wish for me though, or I'd have at least tried to come up with something myself by now. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a lazy module loader that allows us to defer loading modules we might not use (though if they are needed we are in fact going to pay for loading them eventually). We monkeypatch the standard library where modules we want are unreasonably expensive to import (for instance by making a regex we wouldn't use be lazy compiled rather than compiled at import time). > IIRC every time I was asked > to reduce the start-up cost of some Python app, the cause was too many > imports, and the solution was either to speed up import itself (.pyc > files were the first thing ever that came out of that -- importing > from a single .zip file is one of the more recent tricks) or to reduce > the number of modules imported at start-up (or both :-). Heavy-weight > frameworks are usually the root cause, but usually there's nothing > that can be done about that by the time you've reached this point. So, > amen on the good luck, but please start with a bit of analysis. Certainly, import time is part of it: robe...@lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors" 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception classes). We've also looked lower - python does a lot of stat operations search for imports and determining if the pyc is up to date; these appear to only really matter on cold-cache imports (but they matter a lot then); in hot-cache situations they are insignificant. Uhm, there's probably more - but I just wanted to note that we have done quite a bit of analysis. I think a large chunk of our problem is having too much code loaded when only a small fraction will be used in any one operation. Consider importing bzrlib errors - 10% of the startup time for 'bzr help'. In any operation only a few of those exceptions will be used - and typically 0. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
divmod.com> writes: > > In email's case this is true, but in JSON's case it's not. JSON is a > format defined as a sequence of code points; MIME is defined as a > sequence of octets. Another to look at it is that JSON is a subset of Javascript, and as such is text rather than bytes. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Robert Collins canonical.com> writes: > > (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame Python for that. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote: > Robert Collins canonical.com> writes: > > > > (errors.py is 3027 lines long with 347 exception classes). > > 347 exception classes? Perhaps your framework is over-engineered. > > Similarly, when using a heavy Web framework, reloading a Web app can take > several seconds... but I won't blame Python for that. Well, we've added exceptions as we needed them. This isn't much different to errno in C programs; the errno range has expanded as people have wanted to signal that specific situations have arisen. The key thing for us is to have both something that can be caught (for library users of bzrlib) and something that can be formatted with variable substitution (for displaying to users). If there are better ways to approach this in python than what we've done, that would be great. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
2009/4/10 Nick Coghlan : > gl...@divmod.com wrote: >> On 03:21 am, ncogh...@gmail.com wrote: >>> Given that json is a wire protocol, that sounds like the right approach >>> for json as well. Once bytes-everywhere works, then a text API can be >>> built on top of it, but it is difficult to build a bytes API on top of a >>> text one. >> >> I wish I could agree, but JSON isn't really a wire protocol. According >> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the >> serialization of structured data". There are some notes about encoding, >> but it is very clearly described in terms of unicode code points. > > Ah, my apologies - if the RFC defines things such that the native format > is Unicode, then yes, the appropriate Python 3.x data type for the base > implementation would indeed be strings. Indeed, the RFC seems to clearly imply that loads should take a Unicode string, dumps should produce one, and load/dump should work in terms of text files (not byte files). On the other hand, further down in the document: """ 3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. """ This is at best confused (in my utterly non-expert opinion :-)) as Unicode isn't an encoding... I would guess that what the RFC is trying to say is that JSON is text (Unicode) and where a byte stream purporting to be JSON is encountered without a defined encoding, this is how to guess one. That implies that loads can/should also allow bytes as input, applying the given algorithm to guess an encoding. And similarly load can/should accept a byte stream, on the same basis. (There's no need to allow the possibility of accepting bytes plus an encoding - in that case the user should decode the bytes before passing Unicode to the JSON module). An alternative might be for the JSON module to register a special encoding ('JSON-guess'?) which captures the rules here. Then there's no need for special bytes parameter handling. Of course, this is all from a native English speaker, who therefore has no idea of the real life issues involved in Unicode :-) Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
>> In email's case this is true, but in JSON's case it's not. JSON is a >> format defined as a sequence of code points; MIME is defined as a >> sequence of octets. > > Another to look at it is that JSON is a subset of Javascript, and as such is > text rather than bytes. I don't think this can be approached from a theoretical point of view. Instead, what matters is how users want to use it. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] decorator module in stdlib?
Guido van Rossum wrote: On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato wrote: Then perhaps you misunderstand the goal of the decorator module. The raison d'etre of the module is to PRESERVE the signature: update_wrapper unfortunately *changes* it. When confronted with a library which I do not not know, I often run over it pydoc, or sphinx, or a custom made documentation tool, to extract the signature of functions. Ah, I see. Personally I rarely trust automatically extracted documentation -- too often in my experience it is out of date or simply absent. Extracting the signatures in theory wouldn't lie, but in practice I still wouldn't trust it -- not only because of what decorators might or might not do, but because it might still be misleading. Call me old-fashioned, but I prefer to read the source code. If you auto-generate API documentation by introspection (which we do at Resolver Systems) then preserving signatures can also be important. Interactive use (support for help), and more straightforward tracebacks in the event of usage errors are other reasons to want to preserve signatures and function name. For instance, if I see a method get_user(self, username) I have a good hint about what it is supposed to do. But if the library (say a web framework) uses non signature-preserving decorators, my documentation tool says to me that there is function get_user(*args, **kwargs) which frankly is not enough [this is the optimistic case, when the author of the decorator has taken care to preserve the name of the original function]. But seeing the decorator is often essential for understanding what goes on! Even if the decorator preserves the signature (in truth or according inspect), many decorators *do* something, and it's important to know how a function is decorated. For example, I work a lot with a small internal framework at Google whose decorators can raise exceptions and set instance variables; they also help me understand under which conditions a method can be called. Having methods renamed to 'wrapped' and their signature changed to *args, **kwargs may tell you there *is* a decorator but doesn't give you any useful information about what it does. If you look at the code then the decorator is obvious (whether or not it mangles the method)... [+1] But I feel strongly about the possibility of being able to preserve (not change!) the function signature. That could be added to functools if enough people want it. +1 Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 2.6.2 final
I wanted to cut Python 2.6.2 final tonight, but for family reasons I won't be able to do so until Monday. Please be conservative in any commits to the 2.6 branch between now and then. bugs.python.org is apparently down right now, but I set issue 5724 to release blocker for 2.6.2. This is waiting for input from Mark Dickinson, and it relates to test_cmath failing on Solaris 10. If Mark fixes that, he's welcome to commit it, otherwise I will remove the release blocker tag on the issue and release 2.6.2 anyway. Plan on me tagging 2.6.2 final Sunday evening. Cheers, -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Evaluated cmake as an autoconf replacement
Neil Hodgson wrote: cmake does not produce relative paths in its generated make and project files. There is an option CMAKE_USE_RELATIVE_PATHS which appears to do this but the documentation says: """This option does not work for more complicated projects, and relative paths are used when possible. In general, it is not possible to move CMake generated makefiles to a different location regardless of the value of this variable.""" This means that generated Visual Studio project files will not work for other people unless a particular absolute build location is specified for everyone which will not suit most. Each person that wants to build Python will have to run cmake before starting Visual Studio thus increasing the prerequisites. This is true. CMake does not generate stand alone transferable projects. CMake must be installed on the machine where the compilation is done. CMake will automatically re-run if any of the inputs are changed, and have visual studio re-load the project, and CMake can be used for simple cross platform commands like file copy and and other operations so that the build files do not depend on shell commands or anything system specific. -Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] decorator module in stdlib?
Guido van Rossum wrote: > On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato >> But I feel strongly about >> the possibility of being able to preserve (not change!) the function >> signature. > > That could be added to functools if enough people want it. No objection in principle here - it's just hard to do cleanly without PEP 362's __signature__ attribute to underpin it. Without that as a basis, I expect you'd end up being forced to do something similar to what Michele does in the decorator module - inspect the function being wrapped and then use exec to generate a wrapper with a matching signature. Another nice introspection enhancement might be to give class and function objects writable __file__ and __line__ attributes (initially set appropriately by the compiler) and have the inspect modules use those when they're available. Then functools.update_wrapper() could be adjusted to copy those attributes, meaning that the wrapper function would point back to the original (decorated) function for the source code, rather than to the definition of the wrapper (note that the actual wrapper code could still be found by looking at the metadata on the function's __code__ attribute). Unfortunately-ideas-aren't-working-code'ly, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
John Arbash Meinel wrote: > Not as big of a difference as I thought it would be... But I bet if > there was a way to put the random shuffle in the inner loop, so you > weren't accessing the same identical 25k keys internally, you might get > more interesting results. You can prepare a few random samples during startup: $ python -m timeit -s"from random import sample; d = dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000) for i in range(200)]).next" "for x in nextrange(): d.get(x)" 10 loops, best of 3: 20.2 msec per loop To put it into perspective: $ python -m timeit -s"d = dict.fromkeys(xrange(10**7)); nextrange = iter([range(25000)]*200).next" "for x in nextrange(): d.get(x)" 100 loops, best of 3: 10.9 msec per loop Peter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Robert Collins wrote: > Certainly, import time is part of it: > robe...@lifeless-64:~$ python -m timeit -s 'import sys; import > bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors" > 10 loops, best of 3: 18.7 msec per loop > > (errors.py is 3027 lines long with 347 exception classes). > > We've also looked lower - python does a lot of stat operations search > for imports and determining if the pyc is up to date; these appear to > only really matter on cold-cache imports (but they matter a lot then); > in hot-cache situations they are insignificant. > Tarek, Georg, and I talked about a way to do both multi-version and speedup of this exact problem with import in the future at pycon. I had to leave before the hackfest got started, though, so I don't know where the idea went from there. Tarek, did this idea progress any? -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Apr 9, 2009, at 10:38 PM, Barry Warsaw wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. As I said in the thread having nearly the same exact discussion on web- sig, except about WSGI headers... What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? Until you write a parser for every header, you simply cannot decode to unicode. The only sane choices are: 1) raw bytes 2) parsed structured data There's no "decoded to unicode but not parsed" option: that's doing things in the wrong order. If you RFC2047-decode the header before doing tokenization and parsing, you will just have a *broken* implementation. Here's an example where it matters. If you decode the RFC2047 part before parsing, you'd decide that there's two recipients to the message. There aren't. ", " is the display-name of "act...@example.com", not a second recipient. To: =?UTF-8?B?PGJyb2tlbkBleGFtcGxlLmNvbT4sIA==?= Here's a quote from RFC2047: NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded- word's to an unencoded form which can be parsed by an RFC 822 mail reader. And another quote for good measure: (2) Any header field not defined as '*text' should be parsed according to the syntax rules for that header field. However, any 'word' that appears within a 'phrase' should be treated as an 'encoded-word' if it meets the syntax rules in section 2. Otherwise it should be treated as an ordinary 'word'. Now, I suppose there's also a third possibility: 3) US-ASCII-only strings, unmolested except for doing a .decode('ascii'). That'll give you a string all right, but it's really just cheating. It's not actually a text string in any meaningful sense. (in all this I'm assuming your question is not about the "Subject" header in particular; that is of course just unstructured text so the parse step doesn't actually do anything...). James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
Paul Moore writes: > On the other hand, further down in the document: > > """ > 3. Encoding > >JSON text SHALL be encoded in Unicode. The default encoding is >UTF-8. > >Since the first two characters of a JSON text will always be ASCII >characters [RFC0020], it is possible to determine whether an octet >stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking >at the pattern of nulls in the first four octets. > """ > > This is at best confused (in my utterly non-expert opinion :-)) as > Unicode isn't an encoding... The word "encoding" (by itself) does not have a standard definition AFAIK. However, since Unicode *is* a "coded character set" (plus a bunch of hairy usage rules), there's nothing wrong with saying "text is encoded in Unicode". The RFC 2130 and Unicode TR#17 taxonomies are annoying verbose and pedantic to say the least. So what is being said there (in UTR#17 terminology) is (1) JSON is *text*, that is, a sequence of characters. (2) The abstract repertoire and coded character set are defined by the Unicode standard. (3) The default transfer encoding syntax is UTF-8. > That implies that loads can/should also allow bytes as input, applying > the given algorithm to guess an encoding. It's not a guess, unless the data stream is corrupt---or nonconforming. But it should not be the JSON package's responsibility to deal with corruption or non-conformance (eg, ISO-8859-15-encoded programs). That's the whole point of specifying the coded character set in the standard the first place. I think it's a bad idea for any of the core JSON API to accept or produce bytes in any language that provides a Unicode string type. That doesn't mean Python's module shouldn't provide convenience functions to read and write JSON serialized as UTF-8 (in fact, that *should* be done, IMO) and/or other UTFs (I'm not so happy about that). But those who write programs using them should not report bugs until they've checked out and eliminated the possibility of an encoding screwup! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Fri, Apr 10, 2009 at 8:38 AM, Stephen J. Turnbull wrote: > Paul Moore writes: > > > On the other hand, further down in the document: > > > > """ > > 3. Encoding > > > > JSON text SHALL be encoded in Unicode. The default encoding is > > UTF-8. > > > > Since the first two characters of a JSON text will always be ASCII > > characters [RFC0020], it is possible to determine whether an octet > > stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > > at the pattern of nulls in the first four octets. > > """ > > > > This is at best confused (in my utterly non-expert opinion :-)) as > > Unicode isn't an encoding... > > The word "encoding" (by itself) does not have a standard definition > AFAIK. However, since Unicode *is* a "coded character set" (plus a > bunch of hairy usage rules), there's nothing wrong with saying "text > is encoded in Unicode". The RFC 2130 and Unicode TR#17 taxonomies are > annoying verbose and pedantic to say the least. > > So what is being said there (in UTR#17 terminology) is > > (1) JSON is *text*, that is, a sequence of characters. > (2) The abstract repertoire and coded character set are defined by the > Unicode standard. > (3) The default transfer encoding syntax is UTF-8. > > > That implies that loads can/should also allow bytes as input, applying > > the given algorithm to guess an encoding. > > It's not a guess, unless the data stream is corrupt---or nonconforming. > > But it should not be the JSON package's responsibility to deal with > corruption or non-conformance (eg, ISO-8859-15-encoded programs). > That's the whole point of specifying the coded character set in the > standard the first place. I think it's a bad idea for any of the core > JSON API to accept or produce bytes in any language that provides a > Unicode string type. > > That doesn't mean Python's module shouldn't provide convenience > functions to read and write JSON serialized as UTF-8 (in fact, that > *should* be done, IMO) and/or other UTFs (I'm not so happy about > that). But those who write programs using them should not report bugs > until they've checked out and eliminated the possibility of an > encoding screwup! The current implementation doesn't do any encoding guesswork and I have no intention to allow that as a feature. The input must be unicode, UTF-8 bytes, or an encoding must be specified. Personally most of experience with JSON is as a wire protocol and thus bytes, so the obvious function to encode json should do that. There probably should be another function to get unicode output, but nobody has ever asked for that in the Python 2.x version. They either want the default behavior (encoding as ASCII str which can be used as unicode due to implementation details of Python 2.x) or encoding as a more compact UTF-8 str (without escaping non-ASCII code points). Perhaps Python 3 users would ask for a unicode output when decoding though. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
> (3) The default transfer encoding syntax is UTF-8. Notice that the RFC is partially irrelevant. It only applies to the application/json mime type, and JSON is used in various other protocols, using various other encodings. > I think it's a bad idea for any of the core > JSON API to accept or produce bytes in any language that provides a > Unicode string type. So how do you integrate the encoding detection that the RFC suggests to be done? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
Barry Warsaw wrote: > In that case, we really need the > bytes-in-bytes-out-bytes-in-the-chewy- > center API first, and build things on top of that. Yep. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote: On 02:38 am, ba...@python.org wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? My personal preference would be to just get deprecate this API, and get rid of it, replacing it with a slightly more explicit one. message.headers['Subject'] message.bytes_headers['Subject'] This is pretty darn clever Glyph. Stop that! :) I'm not 100% sure I like the name .bytes_headers or that .headers should be the decoded header (rather than have .headers return the bytes thingie and say .decoded_headers return the decoded thingies), but I do like the general approach. Now, setting headers. Sometimes you have some unicode thing and sometimes you have some bytes. You need to end up with bytes in the ASCII range and you'd like to leave the header value unencoded if so. But in both cases, you might have bytes or characters outside that range, so you need an explicit encoding, defaulting to utf-8 probably. message.headers['Subject'] = 'Some text' should be equivalent to message.headers['Subject'] = Header('Some text') Yes, absolutely. I think we're all in general agreement that header values should be instances of Header, or subclasses thereof. My preference would be that message.headers['Subject'] = b'Some Bytes' would simply raise an exception. If you've got some bytes, you should instead do message.bytes_headers['Subject'] = b'Some Bytes' or message.headers['Subject'] = Header(bytes=b'Some Bytes', encoding='utf-8') Explicit is better than implicit, right? Yes. Again, I really like the general idea, if I might quibble about some of the details. Thanks for a great suggestion. -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Thu, 2009-04-09 at 22:38 -0400, Barry Warsaw wrote: > On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote: > > > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw wrote: > > Anyway, aside from that decision, I haven't come up with an elegant > > way to allow /output/ in both bytes and strings (input is I think > > theoretically easier by sniffing the arguments). > > > > Won't this work? (assuming dumps() always returns a string) > > > > def dumpb(obj, encoding='utf-8', *args, **kw): > > s = dumps(obj, *args, **kw) > > return s.encode(encoding) > > So, what I'm really asking is this. Let's say you agree that there > are use cases for accessing a header value as either the raw encoded > bytes or the decoded unicode. What should this return: > > >>> message['Subject'] > > The raw bytes or the decoded unicode? > > Okay, so you've picked one. Now how do you spell the other way? > > The Message class probably has these explicit methods: > > >>> Message.get_header_bytes('Subject') > >>> Message.get_header_string('Subject') > > (or better names... it's late and I'm tired ;). One of those maps to > message['Subject'] but which is the more obvious choice? > > Now, setting headers. Sometimes you have some unicode thing and > sometimes you have some bytes. You need to end up with bytes in the > ASCII range and you'd like to leave the header value unencoded if so. > But in both cases, you might have bytes or characters outside that > range, so you need an explicit encoding, defaulting to utf-8 probably. > > >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >>> Message.set_header('Subject', b'Some bytes') > > One of those maps to > > >>> message['Subject'] = ??? > > I'm open to any suggestions here! Syntactically, there's no sense in providing: Message.set_header('Subject', 'Some text', encoding='utf-16') ...since you could more clearly write the same as: Message.set_header('Subject', 'Some text'.encode('utf-16')) The only interesting case is if you provided a *default* encoding, so that: Message.default_header_encoding = 'utf-16' Message.set_header('Subject', 'Some text') ...has the same effect. But it would be far easier to do all the encoding at once in an output() or serialize() method. Do different headers need different encodings? If so, make message['Subject'] a subclass of str and give it an .encoding attribute (with a default). If not, Message.header_encoding should be sufficient. Robert Brewer fuman...@aminus.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote: At 22:38 -0400 04/09/2009, Barry Warsaw wrote: ... So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: message['Subject'] The raw bytes or the decoded unicode? That's an easy one: Subject: is an unstructured header, so it must be text, thus Unicode. We're looking at a high-level representation of an email message, with parsed header fields and a MIME message tree. I'm liking Glyph's suggestion here. We'll probably have to support the message['Subject'] API for backward compatibility, but in that case it really should be a bytes API. (or better names... it's late and I'm tired ;). One of those maps to message['Subject'] but which is the more obvious choice? Structured header fields are more of a problem. Any header with addresses should return a list of addresses. I think the default return type should depend on the data type. To get an explicit bytes or string or list of addresses, be explicit; otherwise, for convenience, return the appropriate type for the particular header field name. Yes, structured headers are trickier. In a separate message, James Knight makes some excellent points, which I agree with. However the email package obviously cannot support every time of structured header possible. It must support this through extensibility. The obvious way is through inheritance (i.e. subclasses of Header), but in my experience, using inheritance of the Message class really doesn't work very well. You need to pass around factories to parsing functions and your application tends to have its own hierarchy of subclasses for whatever extra things it needs. ISTM that subclassing is simply not the right pattern to support extensibility in the Message objects or Header objects. Yes, this leads me to think that all the MIME* subclasses are essentially /wrong/. Having said all that, the email package must support structured headers. Look at the insanity which is the current folding whitespace splitting and the impossibility of the current code to do the right thing for say Subject headers and Received headers, and you begin to see why it must be possible to extend this stuff. Now, setting headers. Sometimes you have some unicode thing and sometimes you have some bytes. You need to end up with bytes in the ASCII range and you'd like to leave the header value unencoded if so. But in both cases, you might have bytes or characters outside that range, so you need an explicit encoding, defaulting to utf-8 probably. Never for header fields. The default is always RFC 2047, unless it isn't, say for params. The Message class should create an object of the appropriate subclass of Header based on the name (or use the existing object, see other discussion), and that should inspect its argument and DTRT or complain. Message.set_header('Subject', 'Some text', encoding='utf-8') Message.set_header('Subject', b'Some bytes') One of those maps to message['Subject'] = ??? The expected data type should depend on the header field. For Subject:, it should be bytes to be parsed or verbatim text. For To:, it should be a list of addresses or bytes or text to be parsed. At a higher level, yes. At the low level, it has to be bytes. The email package should be pythonic, and not require deep understanding of dozens of RFCs to use properly. Users don't need to know about the raw bytes; that's the whole point of MIME and any email package. It should be easy to set header fields with their natural data types, and doing it with bad data should produce an error. This may require a bit more care in the message parser, to always produce a parsed message with defects. I agree that we should have some higher level APIs that make it easy to compose email messages, and probably easy-ish to parse a byte stream into an email message tree. But we can't build those without the lower level raw support. I'm also convinced that this lower level will be the domain of those crazy enough to have the RFCs tattooed to the back of their eyelids. -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On Apr 9, 2009, at 11:59 PM, Tony Nelson wrote: Thinking about this stuff makes me nostalgic for the sloppy happy days of Python 2.x You now have the opportunity to finally unsnarl that mess. It is not an insurmountable opportunity. No, it's just a full time job . Now where did I put that hack- drink-coffee-twitter clone? -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On Apr 10, 2009, at 1:22 AM, Stephen J. Turnbull wrote: Those objects have headers and payload. The payload can be of any type, though I think it generally breaks down into "strings" for text/ * types and bytes for anything else (not counting multiparts). *sigh* Why are you back-tracking? I'm not. Sleep deprivation on makes it seem like that. The payload should be of an appropriate *object* type. Atomic object types will have their content stored as string or bytes [nb I use Python 3 terminology throughout]. Composite types (multipart/*) won't need string or bytes attributes AFAICS. Yes, agreed. Start by implementing the application/octet-stream and text/plain;charset=utf-8 object types, of course. Yes. See my lament about using inheritance for this. It does seem to make sense to think about headers as text header names and text header values. I disagree. IMHO, structured header types should have object values, and something like While I agree, there's still a need for a higher level API that make it easy to do the simple things. message['to'] = "Barry 'da FLUFL' Warsaw " should be smart enough to detect that it's a string and attempt to (flexibly) parse it into a fullname and a mailbox adding escapes, etc. Whether these should be structured objects or they can be strings or bytes, I'm not sure (probably bytes, not strings, though -- see next exampl). OTOH message['to'] = b'''"Barry 'da.FLUFL' Warsaw" ''' should assume that the client knows what they are doing, and should parse it strictly (and I mean "be a real bastard", eg, raise an exception on any non-ASCII octet), merely dividing it into fullname and mailbox, and caching the bytes for later insertion in a wire-format message. I agree that the Message class needs to be strict. A parser needs to be lenient; see the .defects attribute introduced in the current email package. Oh, and this reminds me that we still haven't talked about idempotency. That's an important principle in the current email package, but do we need to give up on that? In that case, I think you want the values as unicodes, and probably the headers as unicodes containing only ASCII. So your table would be strings in both cases. OTOH, maybe your application cares about the raw underlying encoded data, in which case the header names are probably still strings of ASCII-ish unicodes and the values are bytes. It's this distinction (and I think the competing use cases) that make a true Python 3.x API for email more complicated. I don't see why you can't have the email API be specific, with message['to'] always returning a structured_header object (or maybe even more specifically an address_header object), and methods like message['to'].build_header_as_text() which returns """To: "Barry 'da.FLUFL' Warsaw" """ and message['to'].build_header_in_wire_format() which returns b"""To: "Barry 'da.FLUFL' Warsaw" """ Then have email.textview.Message and email.wireview.Message which provide a simple interface where message['to'] would invoke .build_header_as_text() and .build_header_in_wire_format() respectively. This seems similar to Glyph's basic idea, but with a different spelling. Thinking about this stuff makes me nostalgic for the sloppy happy days of Python 2.x Er, yeah. Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs, Can I have my uucp address back now? -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On approximately 4/10/2009 9:56 AM, came the following characters from the keyboard of Barry Warsaw: On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote: On 02:38 am, ba...@python.org wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? My personal preference would be to just get deprecate this API, and get rid of it, replacing it with a slightly more explicit one. message.headers['Subject'] message.bytes_headers['Subject'] This is pretty darn clever Glyph. Stop that! :) I'm not 100% sure I like the name .bytes_headers or that .headers should be the decoded header (rather than have .headers return the bytes thingie and say .decoded_headers return the decoded thingies), but I do like the general approach. If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. Of course, one could use message.header and message.bythdr and they'd be the same length. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
Glenn Linderman wrote: On approximately 4/10/2009 9:56 AM, came the following characters from the keyboard of Barry Warsaw: On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote: On 02:38 am, ba...@python.org wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? My personal preference would be to just get deprecate this API, and get rid of it, replacing it with a slightly more explicit one. message.headers['Subject'] message.bytes_headers['Subject'] This is pretty darn clever Glyph. Stop that! :) I'm not 100% sure I like the name .bytes_headers or that .headers should be the decoded header (rather than have .headers return the bytes thingie and say .decoded_headers return the decoded thingies), but I do like the general approach. If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. Of course, one could use message.header and message.bythdr and they'd be the same length. Shouldn't headers always be text? Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
"Martin v. Löwis" writes: > > (3) The default transfer encoding syntax is UTF-8. > > Notice that the RFC is partially irrelevant. It only applies > to the application/json mime type, and JSON is used in various > other protocols, using various other encodings. Sure. That's their problem. In Python, Unicode is the native encoding, and we have codecs to deal with the outside world, no? That happens to match very well not only with RFC 4627, but the sidebar on json.org that defines JSON. > > I think it's a bad idea for any of the core JSON API to accept or > > produce bytes in any language that provides a Unicode string type. > > So how do you integrate the encoding detection that the RFC suggests > to be done? I suggest you don't. That's mission creep. Think about writing tests for it, and remember that out in the wild those "various other encodings" almost certainly include Shift JIS, Big5, and KOI8-R. Both those considerations point to "er, let's delegate detection and en/decoding to the nice folks who maintain the codec suite." Where it's embedded in some other protocol which specifies a TES, the TES can be implemented there, too. As I wrote earlier, I don't see anything wrong with providing a wrapper module that deals with some default/common/easy cases. But I'd stick it in the contrib directory. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote: If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. I'm not sure we know that yet, actually. Nothing written for Python 2 counts, and email is too broken in 3 for any sane person to be writing such code for Python 3. Of course, one could use message.header and message.bythdr and they'd be the same length. I was trying to figure out what a 'thdr' was that we'd want to index 'by' it. :) -Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
On Apr 10, 2009, at 2:06 PM, Michael Foord wrote: Shouldn't headers always be text? /me weeps PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
Shouldn't this thread move lock stock and .signature to email-sig? Barry Warsaw writes: > >> It does seem to make sense to think about headers as text header > >> names and text header values. > > > > I disagree. IMHO, structured header types should have object values, > > and something like > > While I agree, there's still a need for a higher level API that make > it easy to do the simple things. Sure. I'm suggesting that the way to determine whether something is simple or not is by whether it falls out naturally from correct structure. Ie, no operations that only a Cirque du Soleil juggler can perform are allowed. > I agree that the Message class needs to be strict. A parser needs to > be lenient; Not always. The Postel Principle only applies to stuph coming in off the wire. But we're *also* going to be parsing pseudo-email components that are being handed to us by applications (eg, the perennial control-character-in-the-unremovable-address Mailman bug). Our parser should Just Say No to that crap. > see the .defects attribute introduced in the current email > package. Oh, and this reminds me that we still haven't talked about > idempotency. That's an important principle in the current email > package, but do we need to give up on that? "Idempotency"? I'm not sure what that means in the context of the email package ... multiplication by zero? Do you mean that .parse().to_wire() should be idempotent? Yes, I think that's a good idea, and it shouldn't be too hard to implement by (optionally?) caching the whole original message or individual components (headers with all whitespace including folding cached verbatim, etc). I think caching has to be done, since stuff like "did the original fold with a leading tab or a leading space, and at what column" and so on seems kind of pointless to encode as attributes on Header objects. [Description of MessageTextView and MessageWireView elided.] > This seems similar to Glyph's basic idea, but with a different spelling. Yes. I don't much care which way it's done, and Glyph's style of spelling is more explicit. But I was thinking in terms of the number of people who are surely going to sing "Mama don' 'low no Unicodes roun' here" and squeal "codec WTF?! outta mah face, man!" ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
Bill Janssen writes: > Barry Warsaw wrote: > > > In that case, we really need the > > bytes-in-bytes-out-bytes-in-the-chewy- > > center API first, and build things on top of that. > > Yep. Uh, I hate to rain on a parade, but isn't that how we arrived at the *current* email package? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. Have you tried http://pypi.python.org/pypi/Importing ? Or more specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ? It does of course use the import lock, but as long as your top-level module code doesn't acquire locks (directly or indirectly), it shouldn't be possible to deadlock. (Or more precisely, to add any *new* deadlocks that you didn't already have.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote: Bill Janssen writes: Barry Warsaw wrote: In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- center API first, and build things on top of that. Yep. Uh, I hate to rain on a parade, but isn't that how we arrived at the *current* email package? Not really. We got here because we were too damn sloppy about the distinction. I'm going to remove python-dev from subsequent follow ups. Please join us at email-sig for further discussion. Barry PGP.sig Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
On Fri, Apr 10, 2009, Barry Warsaw wrote: > On Apr 10, 2009, at 2:06 PM, Michael Foord wrote: >> >> Shouldn't headers always be text? > > /me weeps /me hands Barry a hankie -- Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
Robert Brewer writes: > Syntactically, there's no sense in providing: > > Message.set_header('Subject', 'Some text', encoding='utf-16') > > ...since you could more clearly write the same as: > > Message.set_header('Subject', 'Some text'.encode('utf-16')) Which you now must *parse* and guess the encoding to determine how to RFC-2047-encode the binary mush. I think the encoding parameter is necessary here. > But it would be far easier to do all the encoding at once in an > output() or serialize() method. Do different headers need different > encodings? You can have multiple encodings within a single header (and a naïve algorithm might very well encode "The price of Gödel-Escher-Bach is €25" as "The price of =?ISO-8859-1?Q?G=F6del-Escher-Bach?= is =?ISO-8859-15?Q?=A425?="). > If so, make message['Subject'] a subclass of str and give it an > .encoding attribute (with a default). But if you've set the .encoding attribute, you don't need to encode 'Some text'; .set_header() can take care of it for you. And what about the possibility that the encoding attributes disagree with the argument you passed to the codec? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Google Summer of Code/core Python projects - RFC
Hi all, this year we have 10-12 GSoC applications that I've put in the "relevant to core Python development" category. These projects, if mentors etc are found, are *guaranteed* a slot under the PSF GSoC umbrella. As backup GSoC admin and general busybody, I've taken on the work of coordinating these as a special subgroup within the PSF GSoC, and I thought it would be good to mention them to python-dev. Note that all of them have been run by a few different committers, including Martin, Tarek, Benjamin, and Brett, and they've been obliging enough to triage a few of them. Thanks, guys! Here's what's left after that triage. Note that except for the four at the top, these have all received positive support from *someone* who is a committer and I don't think we need to discuss them here -- patches etc. can go through normal "python-dev" channels during the course of the summer. I am looking for feedback on the first four, though. Can these reasonably be considered "core" priorites for Python? Remember, this "costs" us something in the sense of preferring these over Python subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim, etc. --- Questionable "core": 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it to py3k fits with Guido's request that "more stuff get ported". To be clear, I don't think anyone expects all of NumPy to get ported this summer, but these students will work through issues associated with porting big chunks o' code to py3k. One medium/strong proposal, one medium/weak proposal. Comments/thoughts? 2x "improve testing tools for py3k" -- variously focus on improving test coverage and testing wrappers. One proposes to provide a nice wrapper to make nose and py.test capable of running the regrtests, which (with no change to regrtest) would let people run tests in parallel, distribute or run tests across multiple machines (including Snakebite), tag and run subsets of tests with personal and/or public tags, and otherwise take advantage of many of the nice features of nose and py.test. The other proposes to measure & increase the code coverage of the py3k tests in both Python and C, integrate across multiple machines, and otherwise provide a nice set of integrated reports that anyone can generate on their own machines. This proposal, in particular, could move smoothly towards the effort to produce a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. (This wasn't integrated into the proposal because I only found out about it after the proposals were due.) I personally think that both testing proposals are good, and they grew out of conversations I had with Brett, who thinks that the general ideas are good. So, err, I'm looking for pushback, I guess ;). I can expand on these ideas a bit if people are interested. Both proposals are medium at least, and I've personally been positively impressed with the student interaction. Comments/thoughts? --- Unquestionably "core" by my criteria above: 3to2 tool -- 'nuff said. subprocess improvement -- integrating, testing, and proposing some of the various subprocess improvements that have passed across this list & the bug tracker IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker issues relating to IDLE and Tkinter. roundup VCS integration / build tools to support core development -- a single student proposed both of these and has received some support. See http://slexy.org/view/s2pFgWxufI for details. sphinx framework improvement -- support for per-paragraph comments and user/developer interface for submitting/committing fixes 2x "keyring package" -- see http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/. The poorer one of these will probably be axed unless Tarek gives it strong support. -- --titus -- C. Titus Brown, c...@msu.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
On Fri, Apr 10, 2009 at 5:38 PM, C. Titus Brown wrote: > Hi all, > > this year we have 10-12 GSoC applications that I've put in the "relevant > to core Python development" category. These projects, if mentors etc > are found, are *guaranteed* a slot under the PSF GSoC umbrella. As > backup GSoC admin and general busybody, I've taken on the work of > coordinating these as a special subgroup within the PSF GSoC, and I > thought it would be good to mention them to python-dev. > > Note that all of them have been run by a few different committers, > including Martin, Tarek, Benjamin, and Brett, and they've been obliging > enough to triage a few of them. Thanks, guys! > > Here's what's left after that triage. > . > . > > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker > issues relating to IDLE and Tkinter. > Is it important, for the discussion, to mention that it also involves testing this area (idle and tkinter), Titus ? I'm considering this more important than "just" dealing with the tracker issues. > --titus > -- > C. Titus Brown, c...@msu.edu Regards, -- -- Guilherme H. Polo Goncalves ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote: -> > -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker -> > ? ? ? ?issues relating to IDLE and Tkinter. -> > -> -> Is it important, for the discussion, to mention that it also involves -> testing this area (idle and tkinter), Titus ? I'm considering this -> more important than "just" dealing with the tracker issues. What, I tell you that your app is going to be accepted and we shouldn't argue about it, and you want to argue about it? ;) --titus -- C. Titus Brown, c...@msu.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
gl...@divmod.com wrote: On 03:21 am, ncogh...@gmail.com wrote: Barry Warsaw wrote: I don't know whether the parameter thing will work or not, but you're probably right that we need to get the bytes-everywhere API first. Given that json is a wire protocol, that sounds like the right approach for json as well. Once bytes-everywhere works, then a text API can be built on top of it, but it is difficult to build a bytes API on top of a text one. I wish I could agree, but JSON isn't really a wire protocol. According to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the serialization of structured data". There are some notes about encoding, but it is very clearly described in terms of unicode code points. So I guess the IO library *is* the right model: bytes at the bottom of the stack, with text as a wrapper around it (mediated by codecs). In email's case this is true, but in JSON's case it's not. JSON is a format defined as a sequence of code points; MIME is defined as a sequence of octets. What is the 'bytes support' issue for json? Is it about content within a json text? Or about the transport format of a json text? Reading rfc4627, a json text is a unicode string representation of an instance of one of 6 classes. In Python terms, they are Nonetype, bool, numbers (int, float, decimal?), (unicode) str, list, and [string-keyed] dict. The representation is nearly identical to Python's literals and displays. For transport, the encoding SHALL be one of UTF-8, -16LE/BE, -32LE/BD, with UFT-8 the 'default'. So a json parser (a restricted eval()) tokenizes and parses a stream of unicode chars which in Python could come from either a unicode string or decoded bytes object. The bytes decoding could be either bulk or incremental. Similarly, a json generator (an repr()-like function) produces a stream of unicode chars which again could be optionally encoded to bytes, either incrementally or in bulk. The standard does not specify any correspondence between representations and domain objects, For Python making 'null', 'true', and 'false' inter-convert with None, True, False is obvious. Numbers are slightly more problemmtical. A generator could produce decimal literals from both floats and decimals but without a non-json extension, a parser could only convert back to one, so the other would not round-trip. (Int could be handled by the presence or absence of '.0'.) Similarly, tuples could be represented, like lists, as json square-bracketed arrays, but they would be converted back to lists, not tuples, unless a non-json extension were used. So the two possible byte-suppost content issues I see are how to represent them as legal json strings and/or whether some device should be added to make them round-trip. But as indicated above, these two issues are not unique to bytes. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
Well, I think Numpy is of huge importance to a major Python user segment, the scientific community. I don't know if that makes it 'core', but I strongly agree that it's important. Better testing is always useful, and more "core", but IMO less important. -T On Sat, Apr 11, 2009 at 6:38 AM, C. Titus Brown wrote: > Hi all, > > this year we have 10-12 GSoC applications that I've put in the "relevant > to core Python development" category. These projects, if mentors etc > are found, are *guaranteed* a slot under the PSF GSoC umbrella. As > backup GSoC admin and general busybody, I've taken on the work of > coordinating these as a special subgroup within the PSF GSoC, and I > thought it would be good to mention them to python-dev. > > Note that all of them have been run by a few different committers, > including Martin, Tarek, Benjamin, and Brett, and they've been obliging > enough to triage a few of them. Thanks, guys! > > Here's what's left after that triage. Note that except for the four at > the top, these have all received positive support from *someone* who is > a committer and I don't think we need to discuss them here -- patches > etc. can go through normal "python-dev" channels during the course of the > summer. > > I am looking for feedback on the first four, though. Can these > reasonably be considered "core" priorites for Python? Remember, this > "costs" us something in the sense of preferring these over Python > subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim, > etc. > > --- > > Questionable "core": > > 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it >to py3k fits with Guido's request that "more stuff get ported". >To be clear, I don't think anyone expects all of NumPy to get >ported this summer, but these students will work through issues >associated with porting big chunks o' code to py3k. > >One medium/strong proposal, one medium/weak proposal. > > Comments/thoughts? > > 2x "improve testing tools for py3k" -- variously focus on improving test >coverage and testing wrappers. > >One proposes to provide a nice wrapper to make nose and py.test >capable of running the regrtests, which (with no change to >regrtest) would let people run tests in parallel, distribute or >run tests across multiple machines (including Snakebite), tag >and run subsets of tests with personal and/or public tags, and >otherwise take advantage of many of the nice features of nose >and py.test. > >The other proposes to measure & increase the code coverage of >the py3k tests in both Python and C, integrate across multiple >machines, and otherwise provide a nice set of integrated reports >that anyone can generate on their own machines. This proposal, >in particular, could move smoothly towards the effort to produce >a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. >(This wasn't integrated into the proposal because I only found >out about it after the proposals were due.) > >I personally think that both testing proposals are good, and >they grew out of conversations I had with Brett, who thinks that >the general ideas are good. So, err, I'm looking for pushback, >I guess ;). I can expand on these ideas a bit if people are >interested. > >Both proposals are medium at least, and I've personally been >positively impressed with the student interaction. > > Comments/thoughts? > > --- > > Unquestionably "core" by my criteria above: > > 3to2 tool -- 'nuff said. > > subprocess improvement -- integrating, testing, and proposing some of >the various subprocess improvements that have passed across this >list & the bug tracker > > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker >issues relating to IDLE and Tkinter. > > roundup VCS integration / build tools to support core development -- >a single student proposed both of these and has received some >support. See http://slexy.org/view/s2pFgWxufI for details. > > sphinx framework improvement -- support for per-paragraph comments and >user/developer interface for submitting/committing fixes > > 2x "keyring package" -- see > > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/ > . > The poorer one of these will probably be axed unless Tarek gives it > strong support. > > -- > > --titus > -- > C. Titus Brown, c...@msu.edu > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com > -- -- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe ever
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
On Fri, Apr 10, 2009 at 6:02 PM, C. Titus Brown wrote: > On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote: > -> > > -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker > -> > ? ? ? ?issues relating to IDLE and Tkinter. > -> > > -> > -> Is it important, for the discussion, to mention that it also involves > -> testing this area (idle and tkinter), Titus ? I'm considering this > -> more important than "just" dealing with the tracker issues. > > What, I tell you that your app is going to be accepted and we shouldn't > argue about it, and you want to argue about it? ;) > Oh awesome then :) I think I misread part of your original email. > --titus > -- > C. Titus Brown, c...@msu.edu > -- -- Guilherme H. Polo Goncalves ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
2009/4/10 C. Titus Brown : > 2x "improve testing tools for py3k" -- variously focus on improving test > coverage and testing wrappers. > > One proposes to provide a nice wrapper to make nose and py.test > capable of running the regrtests, which (with no change to > regrtest) would let people run tests in parallel, distribute or > run tests across multiple machines (including Snakebite), tag > and run subsets of tests with personal and/or public tags, and > otherwise take advantage of many of the nice features of nose > and py.test. > > The other proposes to measure & increase the code coverage of > the py3k tests in both Python and C, integrate across multiple > machines, and otherwise provide a nice set of integrated reports > that anyone can generate on their own machines. This proposal, > in particular, could move smoothly towards the effort to produce > a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. > (This wasn't integrated into the proposal because I only found > out about it after the proposals were due.) > > I personally think that both testing proposals are good, and > they grew out of conversations I had with Brett, who thinks that > the general ideas are good. So, err, I'm looking for pushback, > I guess ;). I can expand on these ideas a bit if people are > interested. > > Both proposals are medium at least, and I've personally been > positively impressed with the student interaction. To me, both of those proposals seem to say "measure and improve test coverage" or "nose integration" with a severe lack specific details. Especially the nose plugin one seems like very little work. (Running default nose in the test directory in fact works fairly well.) Another small nit is that they should address Python 2.x, too. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
On Fri, Apr 10, 2009 at 06:05:02PM -0500, Benjamin Peterson wrote: -> 2009/4/10 C. Titus Brown : -> > 2x "improve testing tools for py3k" -- variously focus on improving test -> > ?? ?? ?? ??coverage and testing wrappers. -> > -> > ?? ?? ?? ??One proposes to provide a nice wrapper to make nose and py.test -> > ?? ?? ?? ??capable of running the regrtests, which (with no change to -> > ?? ?? ?? ??regrtest) would let people run tests in parallel, distribute or -> > ?? ?? ?? ??run tests across multiple machines (including Snakebite), tag -> > ?? ?? ?? ??and run subsets of tests with personal and/or public tags, and -> > ?? ?? ?? ??otherwise take advantage of many of the nice features of nose -> > ?? ?? ?? ??and py.test. -> > -> > ?? ?? ?? ??The other proposes to measure & increase the code coverage of -> > ?? ?? ?? ??the py3k tests in both Python and C, integrate across multiple -> > ?? ?? ?? ??machines, and otherwise provide a nice set of integrated reports -> > ?? ?? ?? ??that anyone can generate on their own machines. ??This proposal, -> > ?? ?? ?? ??in particular, could move smoothly towards the effort to produce -> > ?? ?? ?? ??a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. -> > ?? ?? ?? ??(This wasn't integrated into the proposal because I only found -> > ?? ?? ?? ??out about it after the proposals were due.) -> > -> > ?? ?? ?? ??I personally think that both testing proposals are good, and -> > ?? ?? ?? ??they grew out of conversations I had with Brett, who thinks that -> > ?? ?? ?? ??the general ideas are good. ??So, err, I'm looking for pushback, -> > ?? ?? ?? ??I guess ;). ??I can expand on these ideas a bit if people are -> > ?? ?? ?? ??interested. -> > -> > ?? ?? ?? ??Both proposals are medium at least, and I've personally been -> > ?? ?? ?? ??positively impressed with the student interaction. -> -> To me, both of those proposals seem to say "measure and improve test -> coverage" or "nose integration" with a severe lack specific details. -> Especially the nose plugin one seems like very little work. (Running -> default nose in the test directory in fact works fairly well.) ...fairly, yes ;). But not perfectly. And certainly not with equivalent guarantees to regrtest, which is really what Python developers need. Tracking down the corner cases, writing up examples, setting up tags, getting multiprocess to work properly, and making sure that coverage recording works properly, and then getting people to try it out on THEIR machines, is likely to be a lot of work. The plugin ecosystem for nose is growing daily and supporting that for core would be fantastic; extending it to py.test (whose plugin interface is now mostly compatible with nose) would be even better. The lack of detail on the code coverage is intentional, IMO. It's non-trivial to get a full handle on C code coverage integrated with Python code coverage -- or at least it has been for me -- so I supported the student focusing on first writing robust coverage analysis tools, and only then deciding what to "hit" with more tests. I will encourage the student to talk to this list (or the "tests" list in the stdlib sig) in order to target areas that are more relevant to people. I have had a hard time getting a good sense of what core code is well tested and what is not well tested, across various platforms. While Walter's C/Python integrated code coverage site is nice, it would be even nicer to have a way to generate all that information within any particular checkout on a real-time basis. Doing so in the context of Snakebite would be icing... and I think it's worth supporting in core, especially if it can be done without any changes *to* core. -> Another small nit is that they should address Python 2.x, too. I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an equally valid criticism. Certainly 3.x is the future so I though focusing on increasing code coverage, and especially C code coverage, could best be applied to 3.x. cheers, --titus -- C. Titus Brown, c...@msu.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown wrote: [megasnip] > roundup VCS integration / build tools to support core development -- > a single student proposed both of these and has received some > support. See http://slexy.org/view/s2pFgWxufI for details. >From the listed webpage I have no idea what he is promising (a combination of very high level and very low level tasks). If he is offering all the same magic for Hg that Trac does for SVN (autolinking "r2001" text to patches, for example) then I'm +1. That should be cake even for a student project. He says vague things about patches too, but I'm not sure what. If he wanted to make that into a 'patchbot' that just applied every patch in isolation and ran 'make && make test' and posted results in the tracker I'd be a happy camper. But maybe those are goals for next year, because I'm not quite sure what the proposal is. -Jack ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Lazy importing (was Rethinking intern() and its data structure)
Nick Coghlan wrote: I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. I created an ad-hoc one of these for PyGUI recently. I can send you the code if you're interested. I didn't have any problems with deadlocks, but I did find one rather annoying problem. It seems that an exception occurring at certain times during the import process gets swallowed and turned into a generic ImportError. I had to resort to catching exceptions and printing my own traceback in order to diagnose missing auto-imported names. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
Paul Moore wrote: 3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. This is at best confused (in my utterly non-expert opinion :-)) as Unicode isn't an encoding... I'm inclined to agree. I'd go further and say that if JSON is really mean to be a text format, the standard has no business mentioning encodings at all. The reason you use a text format in the first place is that you have some way of transmitting text, and you want to send something that isn't text. In that situation, the encoding is already determined by whatever means you're using to send the text. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Friday, 10 April 2009 at 15:05, P.J. Eby wrote: > At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: >> This problem (slow application startup times due to too many imports at >> startup, which can in turn can be due to top level imports for library >> or framework functionality that a given application doesn't actually >> use) is actually the main reason I sometimes wish for a nice, solid lazy >> module import mechanism that manages to avoid the potential deadlock >> problems created by using import statements inside functions. I'd love to see that too. I imagine it would be beneficial for many python applications. > Have you tried http://pypi.python.org/pypi/Importing ? Or more > specifically, > http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ? Here's what we do in Mercurial, which is a little more user-friendly, but possibly too magical for general use (but provides us a very nice speedup): http://www.selenic.com/repo/index.cgi/hg/file/tip/mercurial/demandimport.py#l1 It's nice and small, and it is invisible to the rest of the code, but it's probably too aggressive for all users. The biggest problem is probably that ImportErrors are deferred until first access, which trips up modules that do things like try: import foo except ImportError import fallback as foo of which there are a few. The mercurial module maintains a blacklist as a bandaid, but it'd be great to have a real fix. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json)
On Fri, Apr 10, 2009 at 12:04 PM, Barry Warsaw wrote: > On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote: > >> Bill Janssen writes: >>> >>> Barry Warsaw wrote: >>> In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- center API first, and build things on top of that. >>> >>> Yep. >> >> Uh, I hate to rain on a parade, but isn't that how we arrived at the >> *current* email package? > > Not really. We got here because we were too damn sloppy about > the distinction. Agreed. I take full responsibility -- the str/unicode approach we introduced in 2.0 seemed like the best thing we could do at the time, but in retrospect it would've been better if we'd left str alone and introduced a unicode type that was truly distinct -- like str in 3.0. The email package is not the only system that ended up with a muddled distinction between the two as a result. > I'm going to remove python-dev from subsequent follow ups. Please join us > at email-sig for further discussion. > >Barry -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Going off-line for a week
Folks, I'm going off-line for a week to enjoy a family vacation. When I come back I'll probably just archive most email unread, so now's your chance to add braces to the language. :-) Not-yet-retiring-ly y'rs, -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
>> In email's case this is true, but in JSON's case it's not. JSON is a >> format defined as a sequence of code points; MIME is defined as a >> sequence of octets. > > What is the 'bytes support' issue for json? Is it about content within > a json text? Or about the transport format of a json text? The question is whether the json parsing should take bytes or str as input, and whether the json marshalling should produce bytes or str. More specifically, the question is whether it is ok to drop bytes. I personally think that it needs to support bytes, and that perhaps str support is optional (as you could always explicitly encode the str as UTF-8 before passing it to the JSON parser, if you somehow managed to get a str of JSON to parse). However, I really think that this question cannot be answered by reading the RFC. It should be answered by verifying how people use the json library in 2.x. > The standard does not specify any correspondence between representations > and domain objects And that is not the issue at all; nobody is debating what output the parsing should produce. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Needing help to change the grammar
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello everybody. My name is Thiago and currently I'm working as a teacher in a high school in Brazil. I have plans to offer in the school a programming course to the students, but I had some problems to find a good langüage. As a Python programmer, I really like the language's syntax and I think that Python is very good to teach programming. But there's a little problem: the commands and keywords are in english and this can be an obstacle to the teenagers that could enter in the course. Because of this, I decided to create a Python version with keywords in portuguese and with some modifications in the grammar to be more portuguese-like. To this, I'm using Python 3.0.1 source code. I already read PEP 306 (How to Change Python's Grammar) and changed the suggested files. My changes currently are working properly except for one thing: the "comp_op". The code that in english Python is written as "is not", in portuguese Python shall be "não é". Besides the translations to the words "is" and "not", I'm also changing the order in which they appear letting "not" before "is". It appears to be a simple change, but strangely, I'm not being able to perform it. I already made correct modifications in Grammar/Grammar file, the new keywords already appear in Lib/keyword.py and I also changed the function validate_comp_op in Modules/parsermodule.c: static int validate_comp_op(node *tree) { (...) else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) { res = (validate_ntype(CHILD(tree, 0), NAME) && validate_ntype(CHILD(tree, 1), NAME) && (((strcmp(STR(CHILD(tree, 0)), "não") == 0) && (strcmp(STR(CHILD(tree, 1)), "é") == 0)) || ((strcmp(STR(CHILD(tree, 0)), "não") == 0) && (strcmp(STR(CHILD(tree, 1)), "em") == 0; if (!res && !PyErr_Occurred()) err_string("operador de comparação desconhecido"); } return (res); } I also looked in the other files proposed in the PEP but I didn't find in them nothing that I recognized as needing changes. But when I type "make" to compile the new language, the following error appears in Lib/encodings/__init__.py (which I already translated to the portuguese Python): ha...@skynet:~/Python-3.0.1$ make Fatal Python error: Py_Initialize: can't initialize sys standard streams File "/home/harry/Python-3.0.1/Lib/encodings/__init__.py", line 73 se entry não é _unknown: ^ SyntaxError: invalid syntax The comp_op doesn't work! I don't know more what to change. Perhaps there's some file that I should modify, but I didn't paid attention enough in it... Please, anybody has some idea of what should I do? Thanks a lot. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJ3/eTmNGEzq1zP84RAh5vAJ492eVFgbR5KCCJNdTJOIR/Xtfb0ACdE0NG Yxnxmo9yjOL6H8J93nPBcJs= =6VLu -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
[Dropping email sig] On 11/04/2009 1:06 PM, "Martin v. Löwis" wrote: However, I really think that this question cannot be answered by reading the RFC. It should be answered by verifying how people use the json library in 2.x. In the absence of anything more formal, here are 2 anecdotes: * The python-twitter package seems to: - Use dumps() mainly to get string objects. It uses it both for __str__, and for an API called 'AsJsonString' - the intent of this seems to be to provide strings for the consumer of the twitter API - its not clear how such consumers would use them. Note that this API doesn't seem to need to 'write' json objects, else I suspect they would then be expecting dumps to return bytes to put on the wire. They expect loads to accept the bytes they are reading directly off the wire. * couchdb's wrappers use these functions purely as bytes - they are either decoding an application/json object from the bits they read, or they are encoding it to use directly in the body of a request (or even directly in the URL of the request!) I find myself conflicted. On one hand I believe the most common use of json will be to exchange data with something inherently byte-based. On the other hand though, json itself seems to be naturally "stringy" and the most natural interface for a casual user would be strings. I'm personally leaning slightly towards strings, putting the burden on bytes-users of json to explicitly use the appropriate encoding, even in cases where it *must* be utf8. On the other hand, I'm too lazy to dig back through this large thread, but I seem to recall a suggestion that using bytes would be significantly faster. If that is true, I'd be happy to settle for bytes as I believe the most common *actual* use of json will be via things like the twitter and couch libraries - and may even be a key bottleneck for such libraries - so people will not be directly exposed to its interface... Mark Cheers, Mark ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Needing help to change the grammar
> It appears to be a simple change, but strangely, I'm not being able to > perform it. I already made correct modifications in Grammar/Grammar > file, the new keywords already appear in Lib/keyword.py and I also > changed the function validate_comp_op in Modules/parsermodule.c: > > static int > validate_comp_op(node *tree) > { > (...) > else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) { > res = (validate_ntype(CHILD(tree, 0), NAME) >&& validate_ntype(CHILD(tree, 1), NAME) >&& (((strcmp(STR(CHILD(tree, 0)), "não") == 0) > && (strcmp(STR(CHILD(tree, 1)), "é") == 0)) >|| ((strcmp(STR(CHILD(tree, 0)), "não") == 0) >&& (strcmp(STR(CHILD(tree, 1)), "em") == 0; > if (!res && !PyErr_Occurred()) > err_string("operador de comparação desconhecido"); > } > return (res); > } > Notice that Python source is represented in UTF-8 in the parser. It might be that the C source code has a different encoding, which would cause the strcmp to fail. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
> I'm personally leaning slightly towards strings, putting the burden on > bytes-users of json to explicitly use the appropriate encoding, even in > cases where it *must* be utf8. On the other hand, I'm too lazy to dig > back through this large thread, but I seem to recall a suggestion that > using bytes would be significantly faster. Not sure whether it would be *significantly* faster, but yes, Bob wrote an accelerator for parsing out of a byte string to make it really fast; IIRC, he claims that it is faster than pickling. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Google Summer of Code/core Python projects - RFC
> 2x "keyring package" -- see > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/. > The poorer one of these will probably be axed unless Tarek gives it > strong support. I don't think these are good "core" projects. Even if the students come up with a complete solution, it shouldn't be integrated with the standard library right away. Instead, it should have a life outside the standard library, and be considered for inclusion only if the user community wants it. I'm also skeptical that this is a good SoC project in the first place. Coming up with a wrapper for, say, Apple Keychain, could be a good project. Coming up with a unifying API for all keychains is out of scope, IMO; various past attempts at unifying APIs have demonstrated that creating them is difficult, and might require writing a PEP (whose acceptance then might not happen within a summer). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Needing help to change the grammar
On Fri, Apr 10, 2009 at 9:58 PM, Harry (Thiago Leucz Astrizi) wrote: > > Hello everybody. My name is Thiago and currently I'm working as a > teacher in a high school in Brazil. I have plans to offer in the > school a programming course to the students, but I had some problems > to find a good langüage. As a Python programmer, I really like the > language's syntax and I think that Python is very good to teach > programming. But there's a little problem: the commands and keywords > are in english and this can be an obstacle to the teenagers that could > enter in the course. > > Because of this, I decided to create a Python version with keywords in > portuguese and with some modifications in the grammar to be more > portuguese-like. To this, I'm using Python 3.0.1 source code. I love the idea (and most recently edited PEP 306) so here are a few suggestions; Brazil has many python programmers so you might be able to make quick progress by asking them for volunteer time. To bug-hunt your technical problem: try switching the "not is" operator to include an underscore "not_is." The python LL(1) grammar checker works for python but isn't robust, and does miss some grammar ambiguities. Making the operator a single word might reveal a bug in the parser. Please consider switching your students to 'real' python part way through the course. If they want to use the vast amount of python code on the internet as examples they will need to know the few English keywords. Also - most python core developers are not native English speakers and do OK :) PyCon speakers are about 25% non-native English speakers and EuroPython speakers are about the reverse (my rough estimate - I'd love to see some hard numbers). Keep up the Good Work, -Jack ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com