Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz
On Aug 29, 2014, at 7:44 PM, Alex Gaynor  wrote:
>  Disabling verification entirely externally to the program, through a CLI flag
>  or environment variable. I'm pretty down on this idea, the problem you hit is
>  that it's a pretty blunt instrument to swing, and it's almost impossible to
>  imagine it not hitting things it shouldn't; it's far too likely to be used in
>  applications that make two sets of outbound connections: 1) to some internal
>  service which you want to disable verification on, and 2) some external
>  service which needs strong validation. A global flag causes the latter to
>  fail silently when subjected to a MITM attack, and that's exactly what we're
>  trying to avoid. It also makes things much harder for library authors: I
>  write an API client for some API, and make TLS connections to it. I want
>  those to be verified by default. I can't even rely on the httplib defaults,
>  because someone might disable them from the outside.


I would strongly recommend against such a mechanism.

For what it's worth, Twisted simply unconditionally started verifying 
certificates in 14.0 with no "disable" switch, and (to my knowledge) literally 
no users have complained.

Twisted has a very, very strict backwards compatibility policy.  For example, I 
once refused to accept the deletion of a class that raised an exception upon 
construction, on the grounds that someone might have been inadvertently 
importing that class, and they shouldn't see an exception until they've seen a 
deprecation for one release.

Despite that, we classified failing to verify certificates as a security bug, 
and fixed it with no deprecation period.  When users type the 's' after the 'p' 
and before the ':' in a URL, they implicitly expect browser-like certificate 
verification.

The lack of complaints is despite the fact that 14.0 has been out for several 
months now, and, thanks to the aforementioned strict policy, users tend to 
upgrade fairly often (since they know they can almost always do so without fear 
of application-breaking consequences).  According to PyPI metadata, 14.0.0 has 
had 273283 downloads so far.

Furthermore, "disable verification" is a nonsensical thing to do with TLS.  
"select a trust root" is a valid configuration option, and OpenSSL already 
provides it via the SSL_CERT_DIR environment variable, so there's no need for 
Python to provide anything beyond that.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz

On Sep 2, 2014, at 4:01 PM, Nick Coghlan  wrote:

> 
> On 3 Sep 2014 08:18, "Alex Gaynor"  wrote:
> >
> > Antoine Pitrou  pitrou.net> writes:
> >
> > >
> > > And how many people are using Twisted as an HTTPS client?
> > > (compared to e.g. Python's httplib, and all the third-party libraries
> > > building on it?)
> > >
> >
> > I don't think anyone could give an honest estimate of these counts, however
> > there's two factors to bare in mind: a) It's extremely strongly recommended 
> > to
> > use requests to make any HTTP requests precisely because httplib is 
> > negligent
> > in certificate and hostname checking by default, b) We're talking about
> > Python3, which has fewer users than Python2.
> 
> Creating *new* incompatibilities between Python 2 & Python 3 is a major point 
> of concern. One key focus of 3.5 is *reducing* barriers to migration, and 
> this PEP would be raising a new one.
> 
No.  Providing the security that the user originally asked for is not a 
"backwards incompatible change".  It is a bug fix.  And believe me: I care a 
_LOT_ about reducing barriers to migration.  This would not be on my list of 
the top 1000 things that make migration difficult.
> It's a change worth making, but we have time to ensure there are easy ways to 
> do things like skipping cert validation, or tolerate expired certificates.
> 

The API already supports both of these things.  What I believe you're 
implicitly saying is that there needs to be a way to do this without editing 
code, and... no, there really doesn't.  Not to mention the fact that you could 
already craft a horrific monkeypatch to allow operators to cause the ssl module 
to malfunction by 'pip install'ing a separate package, which is about as 
supported as this should be.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 476: Enabling certificate validation by default!

2014-09-02 Thread Glyph Lefkowitz

On Sep 2, 2014, at 4:28 PM, Nick Coghlan  wrote:

> On 3 Sep 2014 09:08, "David Reid"  wrote:
> >
> > Nick Coghlan  gmail.com> writes:
> >
> > > Creating *new* incompatibilities between Python 2 & Python 3 is a major 
> > > point
> > > of concern.
> >
> > Clearly this change should be backported to Python2.
> 
> Proposing to break backwards compatibility in a maintenance release (...)
> 

As we keep saying, this is not a break in backwards compatibility, it's a bug 
fix.  Yes, systems might break, but that breakage represents an increase in 
security which may well be operationally important.  Not everyone with a 
"working" application has the relevant understanding an expertise to know that 
Python's HTTP client is exposing them to surveillance.  These applications 
should break. That is the very nature of the fix.  It is not a "compatibility 
break" that the system starts correctly rejecting invalid connections.

By way of analogy, here's another kind of breach in security: an arbitrary 
remote code execution vulnerability in XML-RPC.  I think we all agree that any 
0day RCE vulnerabilities in Python really ought to be fixed and could be 
legitimately included without worrying about backwards compatibility breaks.  
(At least... gosh, I hope so.)

Perhaps this arbitrary remote execution looks harmless; the use of an eval() 
instead of an int() someplace.  Perhaps someone discovered that they can do "3 
+ 4" in their XML-RPC and the server does the computation for them.  Great!  
They start relying on this in their applications to use symbolic values in 
their requests instead of having explicit enumerations.  This can save you 
quite a bit of code!

When the RCE is fixed, this application will break, and that's fine.  In fact 
that's the whole point of issuing the fix, that people will no longer be able 
to make arbitrary computation requests of your server any more.  If that 
server's maintainer has the relevant context and actually wants the XML-RPC 
endpoint to enable arbitrary RCE, they can easily modify their application to 
start doing eval() on the data that they received, just as someone can easily 
modify their application to intentionally disable all connection security.  
(Let's stop calling it "certificate verification" because that sounds like some 
kind of clerical detail: if you disable certificate verification, TLS 
connections are unauthenticated and unidentified and therefore insecure.)

For what it's worth, on the equivalent Twisted change, I originally had just 
these concerns, but my mind was changed when I considered what exactly the 
user-interface ramifications were for people typing that 's' for 'secure' in 
URLs.  I was convinced, and we made the change, and there have been no ill 
effects that I'm aware of as a result.  In fact, there has been a renewed 
interest in Twisted for HTTP client work, because we finally made security work 
more or less like it's supposed to, and the standard library is so broken.

I care about the health of the broader Python community, so I will passionately 
argue that this change should be made, but for me personally it's a lot easier 
to justify that everyone should use Twisted (at least since 14+) because 
transport security in the stdlib is such a wreck and even if it gets fixed it's 
going to have easy options to turn it off unilaterally so your application can 
never really be sure if it's getting transport security when it's requesting 
transport security.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Language Summit Follow-Up

2014-05-28 Thread Glyph Lefkowitz
At the language summit, Alex and I volunteered to put together some 
recommendations on what changes could be made to Python (the language) in order 
to facilitate a smoother transition from Python 2 to Python 3.  One of the 
things that motivated this was the (surprising, to us) consideration that 
features like ensurepip might be added to the future versions of the 2.7 
installers from python.org.

The specific motivations for writing this are:

Library maintainers have a rapidly expanding matrix that requires an increasing 
number of branches to satisfy.
People with large corporate codebases absolutely cannot port all at once.

If you don't have perfect test coverage then you can't make any progress.  So 
these changes are intended to make porting from python 2 to python 3 more 
guided and incremental.  We believe that these attributes are necessary.

We would like to stress that we don't believe anything on this list is as 
important as the continuing efforts that everyone in the broader ecosystem is 
making.  If you just want to ease the transition by working on anything at all, 
the best use of your time right now is porting 
https://warehouse.python.org/project/MySQL-python/ to Python 3. :)

Nevertheless there are some things that the language and CPython could do.

Unfortunately we had to reject any proposal that involved new __future__ 
imports, since unknown __future__ imports are un-catchable SyntaxErrors.

Here are some ideas for Python 2.7+.

Add ensurepip to the installers.  Having pip reliably available increases the 
availability of libraries that help with porting, and will generally strengthen 
the broader ecosystem in the (increasingly long) transition period.
Add some warnings about python 3 compatibility.
It should at least be possible to get a warning for every single implicit 
string coercion.
Old-style classes.
Old-style division.
Print statements.
Old-style exception syntax.
buffer().
bytes(memoryview(b'abc'))
Importing old locations from the stdlib (see point 4.)
Long integer syntax.
Use of variables beyond the lifetime of an 'except Exception as e' block or a 
list comprehension.
Backport 'yield from' to allow people to use Tulip and Tulip-compatible code, 
and to facilitate the development of Tulip-friendly libraries and a Tulip 
ecosystem.  A robust Tulip ecosystem requires the participation of people who 
are not yet using Python 3.
Add aliases for the renamed modules in the stdlib.  This will allow people to 
"just write python 3" in a lot more circumstances.
(re-)Enable warnings by default, including enabling -3 warnings.  Right now all 
warnings are silent by default, which greatly reduces discoverability of future 
compatibility issues.  I hope it's not controversial to say that most new 
Python code is still being written against Python 2.7 today; if people are 
writing that code in such a way that it's not 3-friendly, it should be a more 
immediately noticeable issue.
Get rid of 2to3. Particularly, of any discussion of using 2to3 in the 
documentation.  More than one very experienced, well-known Python developer in 
this discussion has told me that they thought 2to3 was the blessed way to port 
their code, and it's no surprise that they think so, given that the first 
technique  mentions is still 
2to3.  We should replace 2to3 with something like 
. 2to3 breaks your code on 
python 2, and doesn't necessarily get it running on python 3.  A more 
conservative approach that reduced the amount of work to get your code 2/3 
compatible but was careful to leave everything working would be a lot more 
effective.
Add a new 'bytes' type that actually behaves like the Python 3 bytes type 
(bytes(5)).

We have rejected any changes for Python 3.5, simply because of the extremely 
long time to get those features into users hands.  Any changes for Python 3 
that we're proposing would need to get into a 3.4.x release, so that, for 
example, they can make their way into Ubuntu 14.04 LTS.

Here are some ideas for Python 3.4.x:

Usage of Python2 style syntax (for example, a print statement) or stdlib module 
names (for example, 'import urllib2') should result in a specific, informative 
warning, not a generic SyntaxError/ImportError.  This will really help new 
users.
Add 'unicode' back as an alias for 'str'.  Just today I was writing some 
documentation where I had to resort to some awkward encoding tricks just to get 
a bytes object out without explaining the whole 2/3 dichotomy in some unrelated 
prose.

We'd like to thank all the individuals who gave input and feedback in creating 
this list.

-glyph & Alex Gaynor

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] command line attachable debugger

2009-07-24 Thread Glyph Lefkowitz
On Fri, Jul 24, 2009 at 9:43 PM, Edward Peschko  wrote:


> There should be a standard mechanism for debuggers to talk to a python
> process; not one-offs for each debugger, probably at the level of the
> python executable (the same way that gcc lets gdb attach with a pid..
>

Sounds like this is moving into hypothetical territory better-suited to
python-ideas.  (Although I'm sure that if you wanted to contribute polished,
tested code for a standard remote debugger interface, few people would
complain.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Implementing File Modes

2009-07-27 Thread Glyph Lefkowitz
On Mon, Jul 27, 2009 at 3:04 PM, Paul Moore  wrote:

> I like MRAB's idea of using a (non-standard) "e" flag to include
> stderr. So "r" reads from stdout, "re" reads from stdout+stderr.
>
> Anything more complicated probably should just use "raw" Popen
> objects. Don't overcomplicate the interface.
>

In my opinion, mangling stderr and stdout together is already an
overcomplication.  It shouldn't be implemented.

It *seems* like a good idea, until you realize that subtle changes to your
OS, environment, or buffering behavior may result in arbitrary, unparseable
output.

For example, let's say you've got a program whose output is a list of lines,
each one containing a number.  Sometimes it tries to import gtk, and fails
to open its display.

That's fine, and you can still deal with it, as long as the interleaved
output looks like this:

100
200
Gtk-WARNING **: cannot open display:
300
400

but of course the output *might* (although unlikely with such small chunks
of output) end up looking like this, instead:

100
2Gtk-WAR0NING0 **:
can30not 0open display:

400

this is the sort of thing which is much more likely to happen once you start
dealing with large volumes of data, where there are more page-boundaries for
your buffers to get confused around, and you are playing with buffering
options to improve performance.  In other words, it's something that fails
only at scale or under load, and is therefore extremely difficult to debug.

This option *might* be okay if it were allowed only on subprocesses opened
in a *text* mode, and if the buffering logic involved forced stderr and
stdout to be line-delimited, and interleave only lines, rather than
arbitrary chunks of bytes.  Of course then if you use this flag with a
program that outputs binary data with no newlines it will buffer forever and
crash your program with a MemoryError, but at least that's easy to debug
when it happens.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Glyph Lefkowitz
On Sun, Aug 2, 2009 at 4:17 PM, Jacob Rus  wrote:

> Robert Lehmann wrote:
> > Jacob Rus wrote:
> >> Here is a somewhat more substantively changed version. This one does
> >> away with the 'inited' flag and the 'init' function, which might be
> >> impossible given that their documented (though I would be extremely
> >> surprised if anyone calls them in third-party code)
> > [snip]
> >
> > There seem to be quite a bunch of high-profile third-party modules
> > relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See
> > http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a
> > more thorough listing.
> >
> > Given that most of them aren't ported to Python 3 yet, I guess, changing
> > the semantics in 3.x seems not-too-bad to me.
>

No, it's bad.  If I may quote Guido:
http://www.artima.com/weblogs/viewpost.jsp?thread=227041

So, once more for emphasis: *Don't change your APIs at the same time as
> porting to Py3k!*
>

Please follow this policy as much as possible in the standard library; the
language transition is going to be hard enough.

Put a different way: please don't change the library unless you're
*also*going to write a 2to3 fixer that somehow updates all calling
code, too.

Ooh, okay.  Well I guess we can’t get rid of those then!
>

Indeed not.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144: IP Address Manipulation Library for the Python Standard Library

2009-08-19 Thread Glyph Lefkowitz
On Wed, Aug 19, 2009 at 2:20 PM, Eric Smith  wrote:


> I think using .network and .broadcast are pretty well understood to be the
> [0] and [-1] of the network address block. I don't think we want to start
> creating new terms or access patterns here.
>
> +1 on leaving .network and .broadcast as-is (including returning a
> IPvXAddress object).
>

-1.  I think 'network.number' or 'network.zero' is a lot clearer than
'network.network'.  Maybe '.broadcast' would be okay, as long as it *can* be
adjusted for those unusual, or maybe even only hypothetical, networks where
it is not the [-1].
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144: IP Address Manipulation Library for the Python Standard Library

2009-08-19 Thread Glyph Lefkowitz
On Wed, Aug 19, 2009 at 4:45 PM, "Martin v. Löwis" wrote:

> > No, I just said its conventionally used as that but its not definition
> > of a broadcast (in fact you can have any valid host address defined
> > as broadcast as long as all members of the network agree on that)
>
> You could, but then you are violating existing protocol specifications.
>
> RFC 1122 mandates, in sections 3.2.1.3 and 3.3.6, that certain addresses
> MUST be understood as broadcast addresses, by all nodes (independent of
> configuration).
>
> I think a Python IP address library should conform to all relevant RFCs.
>

Yes, but section 3.3.6 also states:

There is a class of hosts (4.2BSD Unix and its derivatives, but not 4.3BSD)
that use non-standard broadcast address forms, substituting 0 for -1. All
hosts SHOULD recognize and accept any of these non-standard broadcast
addresses as the destination address of an incoming datagram. A host MAY
optionally have a configuration option to choose the 0 or the -1 form of
broadcast address, for each physical interface, but this option SHOULD
default to the standard (-1) form.

So it sounds like doing what I suggested earlier (default to [-1], allow for
customization) is actually required by the RFC :-).  Although it does sound
like the RFC only requires that you be able to customize to [0] rather than
[-1], rather than any address.  In practical terms though I believe it is
possible to do as Tino suggests and configure any crazy address you want to
be the broadcast address (or addresses, even) for a network.

I think setting the broadcast address to something else just does not need
> to be supported.


It is unusual, but frankly, needing to actually do operations on broadcast
addresses at all is also a pretty unusual task.  Broadcast itself is a
somewhat obscure corner of networking.  I suspect that in many deployments
that need to write significant code to deal with broadcast addresses, rather
than the usual default stuff, funky configurations will actually be quite
common.

I would not be surprised to find that there are still some 4.2BSD VAXes
somewhere doing something important, and some Python may one day be called
upon to manage their networks.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Glyph Lefkowitz
On Sun, Aug 30, 2009 at 8:26 PM, Guido van Rossum  wrote:

> On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon wrote:
> > Right; the code object would think it was loaded from the original
> > location it was created at instead of where it actually is. Now why
> > someone would want to move their .pyc files around instead of
> > recompiling I don't know short of not wanting to send someone source.
>
> I already mentioned replication; it could also just be a matter of
> downloading a tarball with .py and .pyc files.


Also, if you're using Python in an embedded context, bytecode compilation
(or even filesystem access!) can be prohibitively slow, so an uncompressed
.zip file full of compiled .pyc files is really the way to go.

I did this a long time ago on an XScale machine, but recent inspection of
the Android Python scripting stuff shows a similar style of deployment (c.f.
/data/data/com.google.ase/python/lib/python26.zip).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3144 review.

2009-09-29 Thread Glyph Lefkowitz
On Tue, Sep 29, 2009 at 1:00 PM, Guido van Rossum  wrote:

> On Tue, Sep 29, 2009 at 9:03 AM, Antoine Pitrou 
> wrote:
>
> You say it yourself : it describes "the ip address/prefix of a NIC".
> > It isn't the job of a Network class. A Network shouldn't describe a
> > host, or a particular NIC.
>
> Hey Antoine,
>
> Can we drop the pedantic discussion about what "should" or "shouldn't"
> be the job of a "Network" class, and just proceed to a pragmatic
> compromise? Peter has already posted that he is okay with __eq__ and
> friends ignoring the .ip attribute, which sounds good enough to me.
> His use case (which he mentioned to me off-list) is simply that if the
> denormalized .ip attribute weren't saved as part of the IPNetwork
> class, in many cases he'd have to keep track of it separately, which
> just feels clumsy.
>

I apologize in advance for missing a message that answers my question; I've
done my best to read all the related traffic in the various threads that
discuss this, but I'm sure I missed something.

I don't see what's particularly "pragmatic", in terms of functionality,
about confusing the responsibility of the Network class.  Networks don't
have addresses, in the sense that is being discussed here.  Allowing them to
have an IP presents a misleading model, and will encourage applications to
be passing around networks where they should be passing around hosts.  And
yes, the discussion is pedantic, in that some people are certain to learn
about the model of IP networking by reading the documentation of this module
if it gets into the stdlib.  I personally learned all about async networking
from reading the asyncore, select, and socket modules in python 1.5.2, lo
these many years past.

The discussion seems to be centered around the inconvenience of adding a
separate IPNetworkWithHost class that can encapsulate both of these things.
Peter seems to think that this is hugely inconvenient, but classes are
cheap.  If we were talking about IPNetwork.from_string() instead of
IPNetwork(), it seems to me that it would even be acceptable for it to
return a IPNetwork subclass if the address were not canonical (i.e. without
the bits already masked off and zeroed).  Perhaps there should be such a
method, or even just a free function, parse_mask(), as that would allow for
dealing with other user-input use-cases that have been brought up in this
thread.  I don't understand why we can't just add that class and make
everybody happy.  IPNetwork could even have a .canonicalize() method which
would return itself, and the subclass implementation in IPNetworkWithHost
return the corresponding IPNetwork.  (I wish I could come up with a better
name, but I certainly agree that there are cases where a IPNetworkWithHost
is what I would want.)

In addition to the somebody-must-have-mentioned-this-already feeling that I
got, I hesitated to post this message because it doesn't actually seem that
important to me.  While I'm firmly convinced that Network.ip is a design
mistake, it's not like the rest of Python, or for that matter any software,
is completely perfect.  In fact I think this mistake is significantly less
bad than some of the others already present in Python.  If Peter remains
unconvinced, I do think that we should put it in the stdlib, move on, and
get to fixing some of the other stuff we agree needs fixing rather than
continuing to re-hash this.  Primarily because, as far as I can tell, if
hashing and equality are defined the way that everyone seems to be agreeing
they be defined (ignoring the .ip attribute) then those of us who think .ip
is a design error can use the library and safely ignore it completely.

So, I promise not to contribute further to the problem; I won't post again
in this thread against someone who is actually going to do some work here
wants to solicit a clarification of my opinion or some more ideas :).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backport new float repr to Python 2.7?

2009-10-11 Thread Glyph Lefkowitz
On Sun, Oct 11, 2009 at 3:48 PM, Guido van Rossum  wrote:


> I'm -0 -- mostly because of the 3rd party doctests and perhaps also
> because I'd like 3.x to have some carrots. (I've heard from at least
> one author who is very happy with 3.x for the next edition of his
> "programming for beginners" book.)
>

This reasoning definitely makes sense to me; with all the
dependency-migration issues 3.x could definitely use some carrots.  However,
I don't think I agree with it, because this doesn't feel like a big new
feature, just some behavior which has changed.  The carrots I'm interested
in as a user are new possibilties, like new standard library features, a
better debugger/profiler, or everybody's favorate bugaboo, multicore
parallelism.  (Although, to be fair, the removal of old-style classes
qualifies.)

I'd much rather have my doctests and float-repr'ing code break on 2.7 so I
can deal with it as part of a minor-version upgrade than have it break on
3.x and have to deal with this at the same time as the unicode->str
explosion.  It feels like a backport of this behavior would make the 2->3
transition itself a little easier.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backport new float repr to Python 2.7?

2009-10-11 Thread Glyph Lefkowitz
On Sun, Oct 11, 2009 at 5:16 PM, Brett Cannon  wrote:

> On Sun, Oct 11, 2009 at 13:00, Glyph Lefkowitz wrote:
>
>> The carrots I'm interested in as a user are new possibilties, like new
>> standard library features, a better debugger/profiler, or everybody's
>> favorate bugaboo, multicore parallelism.  (Although, to be fair, the removal
>> of old-style classes qualifies.)
>>
> Sure, but if people like Mark are having to spend their time backporting
> every bit of behaviour like this then we won't have the time and energy to
> add the bigger carrots to 3.x to help entice people to switch.
>

Okay, call me +0 then.  Not one of the migration issues I'm really sweating
about :).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Better module shutdown procedure

2009-10-14 Thread Glyph Lefkowitz
On Wed, Oct 14, 2009 at 2:45 PM, Neil Schemenauer  wrote:

> On Wed, Oct 14, 2009 at 08:35:28PM -, exar...@twistedmatrix.com wrote:
> > I notice that the patch doesn't include any unit tests for the feature
> > being provided
>
> That's hard to do although I suppose not impossible. We would be
> trying to test the interpreter shutdown procedure. I suppose on
> plaforms that support it we could fork a subprocess and then inspect
> the output from __del__ methods.


Why not just expose the module-teardown procedure and call it on a module,
then inspect the state there as it's being torn down?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible language summit topic: buildbots

2009-10-25 Thread Glyph Lefkowitz

On Oct 25, 2009, at 3:06 PM, Martin v. Löwis wrote:


(*) it may help if Buildbot would create a Win32 job object, and
then use TerminateJobObject. Contributions are welcome.


Some work has already been done on this, but it needs help.  At the  
root it's a Twisted issue:


http://twistedmatrix.com/trac/ticket/2726


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-03 Thread Glyph Lefkowitz

On Nov 3, 2009, at 5:16 PM, Paul Moore wrote:

2009/11/3 Brett Cannon :

I'm afraid there is some FUD going around here, which is
understandable since no one wants to burn a ton of time on something
that will be difficult or take a lot of time. But I have not heard
anyone in this email thread (or anywhere for that matter) say that
they tried a port in earnest and it turned out to be difficult.



FWIW, I did a quick survey of some packages (a sampling of packages
I've used or considered using in the past):

Twisted - no plans yet for Python 3



Speaking of FUD, we've had a plan for Python 3 support for some time:

http://twistedmatrix.com/trac/ticket/2484

http://stackoverflow.com/questions/172306/how-are-you-planning-on-handling-the-migration-to-python-3/214601#214601

Not only that, but progress is actually being made on that plan, as it  
is being slowly executed by contributors from the community, a  
sampling of which you can see on these tickets, linked from the bottom  
of the "master plan" ticket I mentioned above (#2484):


http://twistedmatrix.com/trac/ticket/4053
http://twistedmatrix.com/trac/ticket/4065
http://twistedmatrix.com/trac/ticket/4066

If you're interested in helping, our core team has all not had much  
time for Twisted lately and we need volunteers who are interested in  
doing code reviews and becoming a committer to help shepherd these  
tickets through the process.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-03 Thread Glyph Lefkowitz

On Nov 3, 2009, at 6:23 AM, Paul Moore wrote:


2009/11/3 Raymond Hettinger :
In all these matters, I think the users should get a vote.  And  
that vote
should be cast with their decision to stay with 2.x, or switch to  
3.x, or
try to support both.  We should not muck with their rational  
decision making

by putting "carrots" in one pile and abandoning the other.



The biggest issue to my mind is that adoption by the ultimate end
users is significantly hampered by the fact that big projects like
Twisted, numpy and the like, have no current plans to move to Python
3. Even end users with a reasonable level of coding expertise don't
have the time or resources to offer much in the way of help with a
port, when the project as a whole isn't interested in starting the
process.


For what it's worth, the official position of the Twisted project is  
not that we have "no plan" to move to Python 3.  It's that our plan is  
to do exactly as Raymond suggests, and give the users a vote - in this  
case, you vote with your patches :).


We are actively and frequently encouraging our users to contribute  
patches that clean up warnings, which is the biggest impediment to a  
py3 port of Twisted.  Some of you would probably expecting to whinge  
about how people never contribute anything, but actually, users *have*  
shown up and started doing this.  Our biggest problem at the moment is  
that we don't have enough people doing code reviews so the  
contributions are starting to pile up.  As I said in my other message,  
if someone would like to help, signing up to do code reviews would be  
a good way.


Despite this progress, my hope is that there will be a robust 2.x  
series up through 2.9.


For one thing, we have a very long row to hoe here.  The migration to  
3.0 is a long, tedious process with little tangible benefit.  I hope  
that sometime in the next decade Twisted can accelerate the process of  
dropping old 2.x versions, but I seriously doubt we could do a feature- 
complete 3.1/2.6 version.  I get the general impression that a 3.2/2.7  
port would be more feasible; hopefully a 3.3/2.8 would be even moreso.


Also, the benefits of migrating to python 3.x are still negligible, as  
far as I can tell.  On the one hand, you've got a Python with no old- 
style classes and a clear unicode/bytes situation, and that's great.   
On the other hand, you've got NumPy, PyGTK, Unladen Swallow, PyPy,  
Jython, IronPython, and so on and so forth.  Since I started using it,  
the strength of Python has been in its ecosystem, and the 3.x  
ecosystem is not yet viable.


As long as we're tossing out modest proposals here, I still think that  
(as I believe James Knight already proposed) abandoning the current  
3.x branch, backporting everything to 2.7, and continuing forward with  
a migration strategy that introduces individual deprecations every  
major version until 2.x == 3.x is the way to go.  For example, 2.8  
could emit a deprecation warning for every old-style class that was  
defined, 2.9 could emit a deprecation warning for every string  
constant declared without a 'b' or 'u' prefix unless the module in  
question were in "3.x mode" (i.e. no-prefix == 'u').  (I leave the  
determination of whether the parser should be in 3.x mode for a  
particular module as an exercise for the reader, but a 'from  
__future__' import would suffice.)


I realize that there are other issues here, like the C ABI changes  
some NumPy folks have raised.  Also, I'm not planning to actually do  
any *work* on this suggestion, so you can take it for what it is,  
which is to say, armchair quarterbacking.


There have been some other comments in this thread indicating that  
this was not the case because some users indicated that they'd rather  
deal with lots of changes "all at once".  My understanding is that it  
was done this way so that the *developers* of Python could make a  
clean break, and design and implement a new version of Python without  
being constrained by compatibility concerns.  If you can show me an  
actual application or library developer in Python who wanted this one- 
big-jump migration, I will show you a crazy person.


The main reason I want a long 2.x series is that I believe it would  
more easily allow us infrastructure folks to drop support for *older*  
versions.  With this big 2.x->3.x chasm, I can't really see an end in  
sight for Twisted using Python 2.x as its _source_ language,  
translating with 2to3.  Some projects which depend on Twisted and want  
new versions (and security fixes, etc) are going to want Python 2.x  
for a really long time.  Maybe they're just really conservative, maybe  
they don't have a lot of maintenance energy, or maybe they have other  
dependencies which haven't got a port; it doesn't really matter,  
empirically speaking people want older versions of Python.


Keep in mind also that the 2.x translation process is extremely slow  
and results in a clunky development pro

Re: [Python-Dev] Status of the Buildbot fleet and related bugs

2009-11-07 Thread Glyph Lefkowitz

On Nov 6, 2009, at 6:34 PM, Georg Brandl wrote:


R. David Murray schrieb:


So, overall I think the buildbot fleet is in good shape, and if
we can nail issue 6748 I think it will be back to being an
important resource for sanity checking our checkins.


Yay! Thanks to all of you!


Indeed!  It's great to see so much work going into build and test  
maintenance.  Thanks a lot!


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3003 - Python Language Moratorium

2009-11-07 Thread Glyph Lefkowitz


On Nov 6, 2009, at 4:52 PM, exar...@twistedmatrix.com wrote:


On 09:48 pm, rdmur...@bitdance.com wrote:

On Fri, 6 Nov 2009 at 15:48, Glyph Lefkowitz wrote:



Documentation would be great, but then you have to get people to  
read the documentation and that's kind of tricky.  Better would be  
for every project on PyPI to have a score which listed warnings  
emitted with each version of Python.  People love optimizing for  
stuff like that and comparing it.


I suspect that even if all warnings were completely silent by  
default, developers would suddenly become keenly interested in  
fixing them if there were a metric like that publicly posted  
somewhere :).


+1, but somebody needs to write the code...


How would you collect this information?  Would you run the test  
suite for each project?  This would reward projects with small or  
absent test suites. ;)




*I* would not collect this information, as I am far enough behind on  
other projects ;-) but I if I were to advise someone *else* as to how  
to do it, I'd probably add a feature to the 'warnings' module where  
users could opt-in (sort of like popcon.debian.org) to report warnings  
encountered during normal invocations of any of their Python programs.


I would also advise such a hypothetical data-gathering project to  
start with a buildbot doing coverage runs; any warning during the test  
suite would be 1 demerit, any warning during an actual end-user run of  
the application *not* caught by the test suite would be 1000  
demerits :).


And actually it would make more sense if this were part of an overall  
quality metric, like http://pycheesecake.org/ proposes (although I  
think that cheesecake's current metric is not really that great, the  
idea is wonderful).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [issue1644818] Allow importing built-in submodules

2009-12-19 Thread Glyph Lefkowitz

On Dec 19, 2009, at 5:29 AM, Julien Danjou wrote:

> Well, that's disapointing. I work on several other free software
> projects, and my time is really scarce too.
> 
> I understand blackmailing me to close a bug can be seen as a nice game.
> Honestly, if I had more time to get involve in that area, I'll
> take it as a game and would do it with pleasure.
> 
> But in my current position and with "I-do-software-developement-too",
> you are just pissing me off for not fixing a bug in your program with a 10
> lines long patch written by someone else 3 years ago.
> 
> Something that should take 5 minutes, probably the time we both lost by
> writing our respective emails. Or give commit access, I'll do it for
> you.

I think you're missing the point here.  This one particular patch is 10 lines 
long, but the problem is that there are thousands of patches in the Python 
tracker, many of which are buggy or incorrect, all of which need to be 
reviewed.  All of which *are* being reviewed, as people have time.  Nothing is 
particularly special about your patch.

In other words, Martin is asking you to review only 5 patches, but you're 
asking him to review tens of thousands.  I think the 5-for-1 deal is a great 
idea, because it takes peoples' impatience and turns it into a resource to deal 
with other peoples' impatience :).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] First draft of "sysconfig"

2009-12-23 Thread Glyph Lefkowitz

On Dec 23, 2009, at 10:00 AM, Frank Wierzbicki wrote:

> On Mon, Dec 14, 2009 at 5:58 PM, Tarek Ziadé  wrote:
>> and for Linux and al, I am not sure but maybe a prefix for
>> Jython/etc.. could be used
>> for all paths.
>> 
>> ~/.locale/lib/python/2.6/site-packages/...
>> ~/.locale/jython/lib/python/2.6/site-packages/...
>> 
>> (I didn't digg on how Jython organizes things yet, any hint would be
>> appreciated)
> Jython does not yet support user site-packages, but I think the above
> looks like a fine structure for us when we get to implementing it.

Two minor points:

1. It's "~/.local", not "~/.locale" ;-)

2. I think it would be a better idea to do 
"~/.local/lib/jython/2.6/site-packages".

The idea with ~/.local is that it's a mirror of the directory structure 
convention in /, /usr/, /opt/ or /usr/local/.  In other words it's got "bin", 
"lib", "share", "etc", etc..  ~/.local/jython/lib suggests that the parallel 
scripts directory would be ~/.local/jython/bin, which means I need 2 entries on 
my $PATH instead of one, which means yet more setup for people who use both 
Python and Jython per-user installation.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion: new 3 release with backwards compatibility

2010-01-05 Thread Glyph Lefkowitz
On Jan 5, 2010, at 2:00 PM, Ian Bicking wrote:

> It's not even that easy -- libraries can't apply patches for Python 3 
> compatibility as they usually break Python 2 compatibility.  Potentially 
> libraries could apply patches that make a codebase 2to3 ready, but from what 
> I've seen that's more black magic than straight forward updating, as such 
> patches have to trick 2to3 producing the output that is desired.

It seems like this is a problem to be addressed, then.  Let's get the "black 
magic" to be better specified and documented. 
 is an interesting start on 
this, but it would be better if this work could be put into 2to3 fixers as well.

> The only workable workflow I've seen people propose for maintaining a single 
> codebase with compatibility across both 2 and 3 is to use such tricks, with 
> aliases to suppress some 2to3 updates when they are inappropriate, so that 
> you can run 2to3 on install and have a single canonical Python 2 source.  
> Python 2.7 won't help much (even though it is trying) as the introduction of 
> non-ambiguous constructions like b"" aren't compatible with previous versions 
> of Python and so can't be used in many libraries (support at least back to 
> Python 2.5 is the norm for most libraries, I think).

No-op constructions like 'bytes("")' could help for older versions of Python, 
though.  A very, very small runtime shim could provide support for these, if 
2to3 could be told about it somehow.

> Also, running 2to3 on installation is kind of annoying, as you get source 
> that isn't itself the canonical source, so to fix bugs you have to look at 
> the installed source and trace it back to the bug in the original source.

Given the way tracebacks are built, i.e. from filenames stored in .pycs rather 
than based on where the code was actually loaded in the filesystem, couldn't 
2to3 could do .pyc rewriting to point at the original source?  Sort of like our 
own version of the #line directive? :)

Seriously though, I find it hard to believe that this is a big problem.  The 
3.x source looks pretty similar to the 2.x source, and it's good to look at 
both if you're dealing with a 3.x issue.

> I suspect a reasonable workflow might be possible with hg and maybe patch 
> queues, but I don't feel familiar enough with those tools to map that out.

This is almost certainly more of a pain than trying to trick 2to3 into doing 
the right thing.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Glyph Lefkowitz


On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:

> On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner
>  wrote:
>> Hi,
>> 
>> Builtin open() function is unable to open an UTF-16/32 file starting with a
>> BOM if the encoding is not specified (raise an unicode error). For an UTF-8
>> file starting with a BOM, read()/readline() returns also the BOM whereas the
>> BOM should be "ignored".

> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
> talk. And for the other two, perhaps it would make more sense to have
> a separate encoding-guessing function that takes a binary stream and
> returns a text stream wrapping it with the proper encoding?

It *is* crazy, but unfortunately rather common.  Wikipedia has a good 
description of the issues: 
.  Basically, some Windows 
text APIs will emit a UTF-8 "BOM" in order to identify the file as being UTF-8, 
so it's become a convention to do that.  That's not good enough, so you need to 
guess the encoding as well to make sure, but if there is a BOM and you can 
otherwise verify that the file is probably UTF-8 encoded, you should discard it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Improve open() to support reading file starting with an unicode BOM

2010-01-07 Thread Glyph Lefkowitz

On Jan 7, 2010, at 11:21 PM, Guido van Rossum wrote:

> On Thu, Jan 7, 2010 at 7:34 PM, Glyph Lefkowitz  
> wrote:
>> 
>> On Jan 7, 2010, at 7:52 PM, Guido van Rossum wrote:
>>> 
>>> I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy
>>> talk. And for the other two, perhaps it would make more sense to have
>>> a separate encoding-guessing function that takes a binary stream and
>>> returns a text stream wrapping it with the proper encoding?
>> 
>> It *is* crazy, but unfortunately rather common.  Wikipedia has a good
>> description of the issues:
>> <http://en.wikipedia.org/wiki/UTF-8#Byte-order_mark>.  Basically, some
>> Windows text APIs will emit a UTF-8 "BOM" in order to identify the file as
>> being UTF-8, so it's become a convention to do that.  That's not good
>> enough, so you need to guess the encoding as well to make sure, but if there
>> is a BOM and you can otherwise verify that the file is probably UTF-8
>> encoded, you should discard it.
> 
> That doesn't make sense. If the file isn't UTF-8 you can't see the
> BOM, because the BOM itself is UTF-8-encoded.

I'm saying that the BOM itself isn't enough to detect that the file is actually 
UTF-8.  If (for whatever reason: explicitly specified, guessed in some other 
way) the file's encoding is determined to be something else, the bytes 
comprising the BOM should be decoded as normal.  It's just that the UTF-8 
decoding of the BOM at the start of a file should be "".

> (And yes, I know this happens. Doesn't mean we need to auto-guess by
> default; there are lots of issues e.g. what should happen after
> seeking to offset 0?)

I think it's pretty clear that the BOM should still be skipped in that case ...

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread Glyph Lefkowitz

On Jan 21, 2010, at 7:25 AM, Antoine Pitrou wrote:

>> We seek guidance from the community on
>> an acceptable level of increased memory usage.
> 
> I think a 10-20% increase would be acceptable.

It would be hard for me to put an exact number on what I would find acceptable, 
but I was really hoping that we could get a *reduced* memory footprint in the 
long term.

My real concern here is not absolute memory usage, but usage for each 
additional Python process on a system; even if Python supported fast, GIL-free 
multithreading, I'd still prefer the additional isolation of multiprocess 
concurrency.  As it currently stands, starting cores+1 Python processes can 
start to really hurt, especially in many-core-low-RAM environments like the 
Playstation 3.

So, if memory usage went up by 20%, but per-interpreter overhead were decreased 
by more than that, I'd personally be happy.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread Glyph Lefkowitz

On Jan 21, 2010, at 6:48 PM, Collin Winter wrote:

> Hey Glyph,

> There's been a recent thread on our mailing list about a patch that
> dramatically reduces the memory footprint of multiprocess concurrency
> by separating reference counts from objects. We're looking at possibly
> incorporating this work into Unladen Swallow, though I think it should
> really go into upstream CPython first (since it's largely orthogonal
> to the JIT work). You can see the thread here:
> http://groups.google.com/group/unladen-swallow/browse_thread/thread/21d7248e8279b328/2343816abd1bd669

AWESOME.

Thanks for the pointer.  I read through both of the threads but I didn't see 
any numbers on savings-per-multi-process.  Do you have any?

Keep up the good work,

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Executing zipfiles and directories (was Re: PyCon Keynote)

2010-01-26 Thread Glyph Lefkowitz

On Jan 26, 2010, at 3:20 PM, Ian Bicking wrote:
> Sadly you can't then do:
> 
>   chmod +x mz.py
>   ./mz.py

Unless I missed some subtlety earlier in the conversation, yes you can :).

> because it doesn't have "#!/usr/bin/env python" like typical executable 
> Python scripts have.  You can put the shebang line at the beginning of the 
> zip file, and zip will complain about it but will still unpack the file, but 
> it won't be runnable as Python won't recognize it as a zip anymore.

python 2.6's zipfile module can cope with a shebang line in a zip file just 
fine, and since this is the first version of Python which supports this 
feature, that means the following works just fine (tested on OS X and Linux):

$ echo '#!/usr/bin/python
> ' > header.txt
$ echo 'import sys; sys.stdout.write("Hello, world!\n")' > __main__.py
$ zip go.zip __main__.py 
  adding: __main__.py (deflated 2%)
$ cat header.txt go.zip > go.py
$ chmod a+x go.py 
$ ./go.py 
Hello, world!

This use-case was specifically mentioned on 
, too.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] setUpClass and setUpModule in unittest

2010-02-12 Thread Glyph Lefkowitz
On Feb 11, 2010, at 1:11 PM, Guido van Rossum wrote:

> I have skimmed this thread (hence this reply to the first rather than
> the last message), but in general I am baffled by the hostility of
> testing framework developers towards their users. The arguments
> against class- and module-level seUp/tearDown functions seems to be
> inspired by religion or ideology more than by the zen of Python. What
> happened to Practicality Beats Purity?

My sentiments tend to echo Jean-Paul Calderone's in this regard, but I think 
what he's saying bears a lot of repeating.  We really screwed up this feature 
in Twisted and I'd like to make sure that the stdlib doesn't repeat the 
mistake.  (Granted, we screwed it up extra bad 
, but I do think many of the 
problems we encountered are inherent.)

The issue is not that we test-framework developers don't like our users, or 
want to protect them from themselves.  It is that our users - ourselves chief 
among them - desire features like "I want my tests to be transparently 
optimized across N cores and N disks".

I can understand how resistance to setUp/tearDown*Class/Module comes across as 
user-hostility, but I can assure you this is not the case.  It's subtle and 
difficult to explain how incompatible with these advanced features the 
*apparently* straightforward semantics of setting up and tearing down classes 
and modules.  Most questions of semantics can be resolved with a simple 
decision, and it's not clear how that would interfere with other features.

In Twisted's implementation of setUpClass and tearDownClass, everything seemed 
like it worked right up until the point where it didn't.  The test writer 
thinks that they're writing "simple" setUpClass and tearDownClass methods to 
optimize things, except almost by definition a setUpClass method needs to 
manipulate global state, shared across tests.  Which means that said state 
starts getting confused when it is set up and torn down concurrently across 
multiple processes.  These methods seem simple, but do they touch the 
filesystem?  Do they touch a shared database, even a little?  How do they 
determine a unique location to do that?  Without generally available tools to 
allow test writers to mess with the order and execution environment of their 
tests, one tends to write tests that rely on these implementation and ordering 
accidents, which means that when such a tool does arrive, things start breaking 
in unpredictable ways.

> The argument that a unittest framework shouldn't be "abused" for
> regression tests (or integration tests, or whatever) is also bizarre
> to my mind. Surely if a testing framework applies to multiple kinds of
> testing that's a good thing, not something to be frowned upon?

For what it's worth, I am a big fan of abusing test frameworks in generally, 
and pyunit specifically, to perform every possible kind of testing.  In fact, I 
find setUpClass more hostile to *other* kinds of testing, because this 
convenience for simple integration tests makes more involved, 
performance-intensive integration tests harder to write and manage.

> On the other hand, I think we should be careful to extend unittest in
> a consistent way. I shuddered at earlier proposals (on python-ideas)
> to name the new functions (variations of) set_up and tear_down "to
> conform with PEP 8" (this would actually have violated that PEP, which
> explicitly prefers local consistency over global consistency).

This is a very important point.  But, it's important not only to extend 
unittest itself in a consistent way, but to clearly describe the points of 
extensibility so that third-party things can continue to extend unittest 
themselves, and cooperate with each other using some defined protocol so that 
you can combine those tools.

I tried to write about this problem a while ago 
 - the current extensibility API (which 
is mostly just composing "run()") is sub-optimal in many ways, but it's 
important not to break it.

And setUpClass does inevitably start to break those integration points down, 
because it implies certain things, like the fact that classes and modules are 
suites, or are otherwise grouped together in test ordering.  This makes it 
difficult to create custom suites, to do custom ordering, custom per-test 
behavior (like metrics collection before and after run(), or gc.collect() after 
each test, or looking for newly-opened-but-not-cleaned-up external resources 
like file descriptors after each tearDown).

Again: these are all concrete features that *users* of test frameworks want, 
not just idle architectural fantasy of us framework hackers.

I haven't had the opportunity to read the entire thread, so I don't know if 
this discussion has come to fruition, but I can see that some attention has 
been paid to these difficulties.  I have no problem with setUpClass or 
tearDownClass hooks *per se*, as long as they can be impl

Re: [Python-Dev] setUpClass and setUpModule in unittest

2010-02-15 Thread Glyph Lefkowitz

On Feb 13, 2010, at 12:46 PM, Guido van Rossum wrote:

> On Fri, Feb 12, 2010 at 8:01 PM, Glyph Lefkowitz
>  wrote:
>> On Feb 11, 2010, at 1:11 PM, Guido van Rossum wrote:
>> 
>> For what it's worth, I am a big fan of abusing test frameworks in generally, 
>> and pyunit specifically, to perform every possible kind of testing.  In 
>> fact, I find setUpClass more hostile to *other* kinds of testing, because 
>> this convenience for simple integration tests makes more involved, 
>> performance-intensive integration tests harder to write and manage.
> 
> That sounds odd, as if the presence of this convenience would prohibit
> you from also implement other features.

Well, that is the main point I'm trying to make.  There are ways to implement 
setUpClass that *do* make the implementation of other features effectively 
impossible, by breaking the integration mechanisms between tests and framework, 
and between multiple testing frameworks.

And I am pretty sure this is not just my over-reaction; Michael still appears 
to be wrestling with the problems I'm describing.  In a recent message he was 
talking about either breaking compatibility with TestSuite implementations that 
override run(), or test-reordering - both of which I consider important, core 
features of the unittest module.

>> I tried to write about this problem a while ago 
>> <http://glyf.livejournal.com/72505.html> - the current extensibility API 
>> (which is mostly just composing "run()") is sub-optimal in many ways, but 
>> it's important not to break it.
> 
> I expect that *eventually* something will come along that is so much
> better than unittest that, once matured, we'll want it in the stdlib.

I'm not sure what point you're trying to make here.  I was saying "it's not 
perfect, but we should be careful not to break it, because it's all we've got". 
 Are you saying that we shouldn't worry about unittest's composition API, 
because it's just a stopgap until something better comes along?

> (Or, alternatively, eventually stdlib inclusion won't be such a big
> deal any more since distros mix and match. But then inclusion in a
> distro would become every package developer's goal -- and then the
> circle would be round, since distros hardly move faster than Python
> releases...)
> 
> But in the mean time I believe evolving unittest is the right thing to
> do. Adding new methods is relatively easy. Adding whole new paradigms
> (like testresources) is a lot harder, eventually in the light of the
> latter's relative immaturity.

I disagree with your classification of the solutions.

First and foremost: setUpClass is not a "new method", it's a pile of new code 
to call that method, to deal with ordering that method, etc.  Code which has 
not yet been written or tested or tried in the real world. It is beyond simply 
immature, it's hypothetical.  We do have an implementation of this code in 
Twisted, but as I have said, it's an albatross we are struggling to divest 
ourselves of, not something we'd like to propose for inclusion in the standard 
library.  (Nose has this feature as well, but I doubt their implementation 
would be usable, since their idea of a 'test' isn't really TestCase based.)

testresources, by contrast, is a tested, existing package, which people are 
already using, using a long-standing integration mechanism that has been part 
of unittest since its first implementation.  Granted, I would not contest that 
it is "immature"; it is still fairly new, and doesn't have a huge number of 
uses, but it's odd to criticize it on grounds of maturity when it's so much 
*more* mature than the alternative.

While superficially the programming interface to testresources is slightly more 
unusual, this is only because programmers don't think to hard about what 
unittest actually does with your code, and testresources requires a little more 
familiarity with that.

>> And setUpClass does inevitably start to break those integration points down, 
>> because it implies certain things, like the fact that classes and modules 
>> are suites, or are otherwise grouped together in test ordering.

> I expect that is what the majority of unittest users already believe.

Yes, but they're wrong, and enforcing this misconception doesn't help anyone.  
There are all kinds of assumptions that most python developers have about how 
Python works which are vaguely incorrect abstractions over the actual behavior.

>> This makes it difficult to create custom suites, to do custom ordering, 
>> custom per-test behavior (like metrics collection before and after run(), or 
>> gc.collect() after each test, or looking for n

Re: [Python-Dev] setUpClass and setUpModule in unittest

2010-02-15 Thread Glyph Lefkowitz

On Feb 15, 2010, at 3:50 PM, Michael Foord wrote:

> On 15/02/2010 20:27, Glyph Lefkowitz wrote:
>> 
>> 
>> On Feb 13, 2010, at 12:46 PM, Guido van Rossum wrote:
>> 
>>> On Fri, Feb 12, 2010 at 8:01 PM, Glyph Lefkowitz
>>>  wrote:
>>>> I find setUpClass more hostile to *other* kinds of testing, because this 
>>>> convenience for simple integration tests makes more involved, 
>>>> performance-intensive integration tests harder to write and manage.
>>> 
>>> That sounds odd, as if the presence of this convenience would prohibit
>>> you from also implement other features.
>> 
>> And I am pretty sure this is not just my over-reaction; Michael still 
>> appears to be wrestling with the problems I'm describing.
> And I appreciate your input.

Thanks :).

>>  In a recent message he was talking about either breaking compatibility with 
>> TestSuite implementations that override run(), or test-reordering - both of 
>> which I consider important, core features of the unittest module.
> 
> Well, by "breaking compatibility with custom TestSuite implementations that 
> override run" I mean that is one possible place to put the functionality. 
> Code that does override it will *not* stop working, it just won't support the 
> new features.

Ah, I see.  This doesn't sound *too* bad, but I'd personally prefer it if the 
distinction were a bit more clearly drawn.  I'd like frameworks to be able to 
implement extension functionality without having to first stub out 
functionality.  In other words, if I want a test suite without setUpClass, I'd 
prefer to avoid having an abstraction inversion.

Practically speaking this could be implemented by having a very spare, basic 
TestSuite base class and ClassSuite/ModuleSuite subclasses which implement the 
setUpXXX functionality.

> If we chose this implementation strategy there would be no compatibility 
> issues for existing tests / frameworks that don't use the new features.

That's very good to hear.

> If tests do want to use the new features then the framework authors will need 
> to ensure they are compatible with them. This seems like a reasonable 
> trade-off to me. We can ensure that it is easy to write custom TestSuite 
> objects that work with earlier versions of unittest but are also compatible 
> with setUpClass in 2.7 (and document the recipe - although I expect it will 
> just mean that TestSuite.run should call a single method if it exists).

This is something that I hope Jonathan Lange or Robert Collins will chime in to 
comment on: expanding the protocol between suite and test is an area which is 
fraught with peril, but it seems like it's something that test framework 
authors always want to do.  (Personally, *I* really want to do it because I 
want to be able to run things asynchronously, so the semantics of 'run()' need 
to change pretty dramatically to support that...)  It might be good to 
eventually develop a general mechanism for this, rather than building up an 
ad-hoc list of test-feature compatibility recipes which involve a list of if 
hasattr(...): foo(); checks in every suite implementation.

> Perhaps a better idea might be to also add startTest and stopTest methods to 
> TestSuite so that frameworks can build in features like timing tests (etc) 
> without having to override run itself. This is already possible in the 
> TestResult of course, which is a more common extensibility point in *my* 
> experience.

I think timing and monitoring tests can mostly be done in the TestResult class; 
those were bad examples.  There's stuff like synthesizing arguments for test 
methods, or deciding to repeat a potentially flaky test method before reporting 
a failure, which are not possible to do from the result.  I'm not sure that 
startTest and stopTest hooks help with those features, the ones which really 
need suites; it would seem it mostly gives you a hook to do stuff that could 
already be done in TestResult anyway.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Platform extension for distutils on other interpreters than CPython

2010-02-24 Thread Glyph Lefkowitz

On Feb 23, 2010, at 2:10 PM, Tarek Ziadé wrote:

> On Tue, Feb 23, 2010 at 1:50 PM, Maciej Fijalkowski  wrote:
>> Hello.
>> 
>> I would like to have a feature on platform module (or sys or
>> somewhere) that can tell distutils or distutils2 that this platform
>> (be it PyPy or Jython) is not able to compile any C module. The
>> purpose of this is to make distutils bail out in more reasonable
>> manner than a compilation error in case this module is not going to
>> work on anything but CPython.
>> 
>> What do you think?
> 
> +1
> 
> I think we could have a global variable in sys, called "dont_compile",
> distutils would look at
> before it tris to compile stuff, exactly like how it does for pyc file
> (sys.dont_write_bytecode)

Every time somebody says "let's have a global variable", God kills a kitten.

If it's in sys, He bludgeons the kitten to death *with another kitten*.

sys.dont_write_bytecode really ought to have been an API of an importer object 
somewhere; hopefully, when Brett has the time to finish the refactoring which 
he alluded to at the language summit, it will be.

Similarly, functionally speaking this API is a good idea, but running the C 
compiler is distutils' job.   Therefore any API which describes this 
functionality should be close to distutils itself.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'languishing' status for the tracker

2010-02-24 Thread Glyph Lefkowitz

On Feb 22, 2010, at 12:17 AM, R. David Murray wrote:

> To expand on this: the desire for this arises from the observation
> that we have a lot of bugs in the tracker that we don't want to close,
> because they are real bugs or non-crazy enhancement requests, but for
> one reason or another (lack of an interested party, lack of a good,
> non-controversial solution, lack of a test platform on which to test the
> bug fix, the fix is hard but the bug is not of a commensurate priority,
> etc) the issue just isn't getting dealt with, and won't get dealt with
> until the blocking factor changes.

In my opinion, the problem is not so much that tickets are left open for a long 
time, as that there's no distinction between triaged and un-triaged tickets.  I 
think it's perfectly fine for tickets to languish as "open", in no special 
state, as long as it's easy to find out whether someone has gotten back to the 
original patch-submitter or bug-reporter to clarify the status at least once.

Of course, then the submitter needs to be able to put it back into the 
un-triaged state by making a counterproposal, or attaching a new patch.

To the extent that people are frustrated with the Python development process, 
it's generally not that their bugs don't get fixed (they understand that 
they're depending on volunteer labor); it's that they went to the trouble to 
diagnose the bug, specify the feature, and possibly even develop a complete fix 
or implementation, only to never hear *anything* about what the likelihood is 
that it will be incorporated.

In the Twisted tracker, whenever we provide feedback or do a code review that 
includes critical feedback that needs to be dealt with before it's merged, we 
re-assign the ticket to its original submitter.  I feel that this is pretty 
clear: it means "the ticket is open, it's valid, but it's also not my problem; 
if you want it fixed, fix it yourself".

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __file__

2010-02-28 Thread Glyph Lefkowitz
On Feb 27, 2010, at 9:38 AM, Nick Coghlan wrote:

> I do like the idea of pulling .pyc only imports out into a separate
> importer, but would go so far as to suggest keeping them as a command
> line option rather than as a separately distributed module.

One advantage of doing this as a separately distributed module is that it can 
have its own ecosystem and momentum.  Most projects that want this sort of 
bundling or packaging really want to be shipped with something like py2exe, and 
I think the folks who want such facilities would be better served by a nice 
project website for "python sealer" or "python bundler" rather than obscure 
directions for triggering the behavior via options or configuration.

Making bytecode loading a feature of interpreter startup, whether it's a config 
file, a command-line option or an environment variable, is not a great idea.  
For folks that want to ship a self-contained application, any of these would 
require an additional customization step, where they need to somehow tell their 
bundled interpreter to load bytecode.  For people trying to ship a 
self-contained and tamper-unfriendly (since even "tamper-resistant" would be 
overstating things) library to relatively non-technical programmers, it opens 
the door to a whole universe of confusion and FAQs about why the code didn't 
load.

However bytecode-only code loading is facilitated, it should be possible to 
bootstrap from a vanilla python interpreter running normally, as you may not 
know you need to load a bytecode-only package at startup.  In the stand-alone 
case there are already plenty of options, and in the library case, shipping a 
zip file should be fine, since the __init__.py of your package should be 
plain-text and also able to trigger the activation of the bytecode-only 
importer.

There are already so many ways to ship bytecode already, it doesn't seem too 
important to support in this one particular configuration (files in a 
directory, compiled by just importing them, in the same place as ".py" files).  
The real problem is providing a seamless transition path for *build* processes, 
not the Python code itself.  Do any of the folks who are currently using this 
feature have a good idea as to how your build and distribute scripts might 
easily be updated, perhaps by a 2to3 fixer?___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __file__ and bytecode-only

2010-03-03 Thread Glyph Lefkowitz

On Mar 3, 2010, at 10:22 PM, Greg Ewing wrote:

> Glenn Linderman wrote:
> 
>> In this scenario, the .pyc files would still live in __pycache__ ?  Complete 
>> with the foo..pyc naming ?
> 
> It might be neater to have a separate cache directory
> for each bytecode version, named __cache.__ or
> some such.

Okay, this is probably some pretty silly bikeshedding, but: if we're going to 
have it be something.something-else, can we please make sure that 
.something-else is a common extension that means "python bytecode cache"?  It 
would be good to keep the file-manager and shell operations required to say 
"blow away bytecode cache directories" as simple as possible.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] doctest, unicode repr, and 2to3

2010-03-04 Thread Glyph Lefkowitz

On Mar 4, 2010, at 11:30 PM, Barry Warsaw wrote:

> If you really want to test that it's a unicode, shouldn't you actually test
> its type?  (I'm not sure what would happen with that under 2to3.)

Presumably 2to3 will be smart enough to translate 'unicode' to 'str' and 
'bytes' to... 'bytes'.  Just don't use 'str' in 2.x and you should be okay :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Catch SIGINT at Python startup

2010-03-08 Thread Glyph Lefkowitz

On Mar 8, 2010, at 4:06 PM, Guido van Rossum wrote:

> I am trying to remember why I made site.py failures non-fatal in the
> first place. I don't have any specific recollection but it must've
> been either from before the separation between site.py (part of the
> stdlib) and sitecustomize.py (site-specific) or out of a worry that if
> some external cause broke site.py (which does a lot of I/O) it would
> be a fatal breakdown of all Python execution.


The thing that occurs to me is that one might want to write an administrative 
tool in Python to manipulate site.py, or even just some data that something in 
site.py would load.  If exceptions from site.py were fatal, then bugs in such a 
tool would be completely unrecoverable; in trying to run it to un-do the buggy 
operation, it would crash immediately.

On the other hand, such a tool should *really* be invoked with the -S option 
anyway, so... maybe not that pressing of a concern.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposing PEP 376

2010-04-01 Thread Glyph Lefkowitz
First: thank you distutils-sig, and especially Tarek, for spearheading this 
effort!

I'm particularly excited about the "Distribution" object that this PEP 
specifies.  I've been waiting for a long time to be able to load an object 
describing a distribution, rather than running setup.py and hoping that it 
mutated the right state!

On Apr 1, 2010, at 5:51 PM, Tarek Ziadé wrote:

> - to provide a basic *uninstaller* feature in the distutils2 project.

Second: It seems to me that a major missing feature in the PEP is the ability 
to run some code during installation and uninstallation, especially since it is 
so easy to run ad-hoc code in setup.py with no way of un-doing what it did.

Twisted's plugin system needs this in order to update a plugin index so that 
plugins can be quickly scanned without being loaded.  However, since this is 
arguably a design flaw in Twisted that should be fixed, I should point out 
there are other systems that have similar requirements, which are considerably 
less mutable: COM registration, other registry keys, adding / removing crontab 
entries, windows services, start menu items, XDG desktop / menu entries, login 
items, edits to the user's shell configuration, etc.  The list goes on and on.

I appreciate the "installer marker" feature, since that will at least allow 
easy_install or pip or something like them to implement this feature with 
minimal risk of being broken by built-in package management tools, but it seems 
like such a simple addition that it would be a shame to leave it out.  If we 
could get rid of setup.py entirely so that it wasn't so easy to run ad-hoc 
stuff during install, I would be happy to leave it to them :).

I realize that there are a lot of potential complexities that might creep into 
the process of determining the execution environment for the code in question, 
but I personally think it would be good enough to say "You'd better be darn 
sure to encode all of the run-time state that you need into your own script, or 
it might break."

Third: The PEP is silent on what happens to files whose hash _has_ changed from 
its install-time value.  I guess the implied plan would be to leave them in 
place.  However, this may have nasty side-effects; for example, if the files 
are test files, then they might be loaded during test discovery, and report 
exceptions since the code that they're testing has been removed.  My suggestion 
would be to have a specific "quarantine" area where the distutils uninstaller 
can put modified files that would have been removed as part of a specific 
distribution, so they aren't still present on PYTHONPATH.  I can also think of 
reasons why you might not want to do this, but either way, the consequence of 
changing an installed file should be made explicitly clear in the PEP: if they 
are to be left in place, it should emphasize that point.

Finally, one minor bit of bikeshedding, of which I promise to say nothing more 
if there is not unanimous agreement: I dislike the use of "get_" in function 
names, since it adds more characters without really adding more information.  
get_file_users is particularly bad, since it makes me think that it's going to 
return a list of processes with a file open, or a list of UIDs or something 
like that.  I suggest these names instead:

get_distributions() -> active_distributions()
get_distribution(name) -> active_distribution_named(name)
get_file_users(path) -> distributions_using_path(path)

where "active" means "on the current sys.path and thereby accessible by 
'import'".  This naming would also make the behavior of get_file_users a bit 
clearer; if the intention is to return only active, loadable distributions (you 
don't want to be able to use get_file_users to inspect other Python 
installations or virtualenvs) it could be called 
active_distributions_using_path.

Thanks again to the PEP's author and many contributors,

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch for review: __import__ documentation

2010-04-14 Thread Glyph Lefkowitz
On Apr 14, 2010, at 5:19 PM, Brett Cannon wrote:

> I see the confusion. I think Martin meant more about open issues that 
> required discussion, not simply issues that had a patch ready to go.

I'm curious - if one isn't supposed to ping the mailing list every time, how 
does one ask the tracker "please show me all the issues which have a patch 
ready to go that hasn't been reviewed / responded to / rejected or applied"?  
It seems like patches sometimes linger for quite a while and often their 
workflow-state is highly unclear (to me, at least).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch for review: __import__ documentation

2010-04-14 Thread Glyph Lefkowitz

On Apr 14, 2010, at 5:19 PM, Brett Cannon wrote:

> I see the confusion. I think Martin meant more about open issues that 
> required discussion, not simply issues that had a patch ready to go.

Ach.  I hit 'send' too soon.  I also wanted to say: it seemed quite clear to me 
that Martin specifically meant "simply issues that had a patch ready to go".  
Quoting him exactly:
>> Please understand that setting the state of an issue to "review" may *not* 
>> be the best way to trigger a review - it may be more effective to post to 
>> python-dev if you truly believe that the patch can be committed as-is.
It seems that perhaps the core developers have slightly different opinions 
about this? :)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] configuring the buildbot to skip some tests?

2010-05-13 Thread Glyph Lefkowitz

On May 13, 2010, at 9:41 AM, exar...@twistedmatrix.com wrote:

> On 03:17 am, jans...@parc.com wrote:
>> I've got parc-tiger-1 up and running again.  It's failing on test_tk,
>> which makes sense, because it's running as a background twisted process,
>> and thus can't access the window server.  I should configure that out.
> 
> You can run it in an xvfb.

See 
:
 this isn't an X server that he's talking about, it's "WindowServer", the OS X 
windowing system, so Xvfb won't help.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-22 Thread Glyph Lefkowitz

On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:

> Jesse, the designated pronouncer for this PEP, has decided to keep discussion 
> open for a few more days.
> 
> So fire away!

As you wish!

The PEP should be consistent in its usage of terminology about callables.  It 
alternately calls them "callables", "functions", and "functions or methods".  
It would be nice to clean this up and be consistent about what can be called 
where.  I personally like "callables".

The execution context of callable code is not made clear.  Implicitly, submit() 
or map() would run the code in threads or processes as defined by the executor, 
but that's not spelled out clearly.

More relevant to my own interests, the execution context of the callables 
passed to add_done_callback and remove_done_callback is left almost completely 
to the imagination.  If I'm reading the sample implementation correctly, 
,
 it looks like in the multiprocessing implementation, the done callbacks are 
invoked in a random local thread.  The fact that they are passed the future 
itself *sort* of implies that this is the case, but the multiprocessing module 
plays fast and loose with object identity all over the place, so it would be 
good to be explicit and say that it's *not* a pickled copy of the future 
sitting in some arbitrary process (or even on some arbitrary machine).

This is really minor, I know, but why does it say "NOTE: This method can be 
used to create adapters from Futures to Twisted Deferreds"?  First of all, 
what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it 
doesn't seem to add anything.  This sentence would read exactly the same if 
that word were deleted.  Without more clarity on the required execution context 
of the callbacks, this claim might not actually be true anyway; Deferred 
callbacks can only be invoked in the main reactor thread in Twisted.  But even 
if it is perfectly possible, why leave so much of the adapter implementation up 
to the imagination?  If it's important enough to mention, why not have a 
reference to such an adapter in the reference Futures implementation, since it 
*should* be fairly trivial to write?

The fact that add_done_callback is implemented using a set is weird, since it 
means you can't add the same callback more than once.  The set implementation 
also means that the callbacks get called in a semi-random order, potentially 
creating even _more_ hard-to-debug order of execution issues than you'd 
normally have with futures.  And I think that this documentation will be 
unclear to a lot of novice developers: many people have trouble with the idea 
that "a = Foo(); b = Foo(); a.bar_method != b.bar_method", but "import 
foo_module; foo_module.bar_function == foo_module.bar_function".

It's also weird that you can remove callbacks - what's the use case?  Deferreds 
have no callback-removal mechanism and nobody has ever complained of the need 
for one, as far as I know.  (But lots of people do add the same callback 
multiple times.)

I suggest having have add_done_callback, implementing it with a list so that 
callbacks are always invoked in the order that they're added, and getting rid 
of remove_done_callback.

futures._base.Executor isn't exposed publicly, but it needs to be.  The PEP 
kinda makes it sound like it is ("Executor is an abstract class...").  Plus, A 
third party library wanting to implement an executor of its own shouldn't have 
to copy and paste the implementation of Executor.map.

One minor suggestion on the "internal future methods" bit - something I wish 
we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on 
separate objects, so that it was very explicit whether you were on the emitting 
side of a Deferred or the consuming side.  That seems to be the case with these 
internal methods - they are not so much "internal" as they are for the producer 
of the Future (whether a unit test or executor) so you might want to put them 
on a different object that it's easy for the thing creating a Future() to get 
at but hard for any subsequent application code to fiddle with by accident.  
Off the top of my head, I suggest naming it "Invoker()".  A good way to do this 
would be to have an Invoker class which can't be instantiated (raises an 
exception from __init__ or somesuch), then a Future.create() method which 
returns an Invoker, which itself has a '.future' attribute.

Finally, why isn't this just a module on PyPI?  It doesn't seem like there's 
any particular benefit to making this a stdlib module and going through the 
whole PEP process - except maybe to prompt feedback like this :).  Issues like 
the ones I'm bringing up could be fixed pretty straightforwardly if it were 
just a matter of filing a bug on a small package, but fixing a stdlib module is 
a major undertaking.

___
Python-Dev

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-23 Thread Glyph Lefkowitz

On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:

> On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
> 
>> 
>> On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
>> 
>>> Jesse, the designated pronouncer for this PEP, has decided to keep 
>>> discussion open for a few more days.
>>> 
>>> So fire away!
>> 
>> As you wish!
> 
> I retract my request ;-)

May you get what you wish for, may you find what you are seeking :).

>> The PEP should be consistent in its usage of terminology about callables.  
>> It alternately calls them "callables", "functions", and "functions or 
>> methods".  It would be nice to clean this up and be consistent about what 
>> can be called where.  I personally like "callables".
> 
> Did you find the terminology confusing? If not then I propose not changing it.

Yes, actually.  Whenever I see references to the multiprocessing module, I 
picture a giant "HERE BE (serialization) DRAGONS" sign.  When I saw that some 
things were documented as being "functions", I thought that maybe there was 
intended to be a restriction like the "these can only be top-level functions so 
they're easy for different executors to locate and serialize".  I didn't 
realize that the intent was "arbitrary callables" until I carefully re-read the 
document and noticed that the terminology was inconsistent.

> But changing it in the user docs is probably a good idea. I like "callables" 
> too.

Great.  Still, users will inevitably find the PEP and use it as documentation 
too.

>> The execution context of callable code is not made clear.  Implicitly, 
>> submit() or map() would run the code in threads or processes as defined by 
>> the executor, but that's not spelled out clearly.

Any response to this bit?  Did I miss something in the PEP?

>> More relevant to my own interests, the execution context of the callables 
>> passed to add_done_callback and remove_done_callback is left almost 
>> completely to the imagination.  If I'm reading the sample implementation 
>> correctly, 
>> <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241>,
>>  it looks like in the multiprocessing implementation, the done callbacks are 
>> invoked in a random local thread.  The fact that they are passed the future 
>> itself *sort* of implies that this is the case, but the multiprocessing 
>> module plays fast and loose with object identity all over the place, so it 
>> would be good to be explicit and say that it's *not* a pickled copy of the 
>> future sitting in some arbitrary process (or even on some arbitrary machine).
> 
> The callbacks will always be called in a thread other than the main thread in 
> the process that created the executor. Is that a strong enough contract?

Sure.  Really, almost any contract would work, it just needs to be spelled out. 
 It might be nice to know whether the thread invoking the callbacks is a daemon 
thread or not, but I suppose it's not strictly necessary.

>> This is really minor, I know, but why does it say "NOTE: This method can be 
>> used to create adapters from Futures to Twisted Deferreds"?  First of all, 
>> what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it 
>> doesn't seem to add anything.  This sentence would read exactly the same if 
>> that word were deleted.  Without more clarity on the required execution 
>> context of the callbacks, this claim might not actually be true anyway; 
>> Deferred callbacks can only be invoked in the main reactor thread in 
>> Twisted.  But even if it is perfectly possible, why leave so much of the 
>> adapter implementation up to the imagination?  If it's important enough to 
>> mention, why not have a reference to such an adapter in the reference 
>> Futures implementation, since it *should* be fairly trivial to write?
> 
> I'm a bit surprised that this doesn't allow for better interoperability with 
> Deferreds given this discussion:

> 

I did not communicate that well.  As implemented, it's quite possible to 
implement a translation layer which turns a Future into a Deferred.  What I 
meant by that comment was, the specification in the PEP was to loose to be sure 
that such a layer would work with arbitrary executors.

For what it's worth, the Deferred translator would look like this, if you want 
to include it in the PEP (untested though, you may want to run it first):

from twisted.internet.defer import Deferred
from twisted.internet.reactor import callFromThread

def future2deferre

Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz

On May 24, 2010, at 5:36 AM, Brian Quinlan wrote:
> On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:
>> On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
>>> On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:

> ProcessPoolExecutor has the same serialization perils that multiprocessing 
> does. My original plan was to link to the multiprocessing docs to explain 
> them but I couldn't find them listed.

Linking to the pickle documentation might be a good start.

> Yes, the execution context is Executor-dependent. The section under 
> ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.

I suppose so.  I guess I'm just looking for more precise usage of terminology. 
(This is a PEP, after all.  It's a specification that multiple VMs may have to 
follow, not just some user documentation for a package, even if they'll 
*probably* be using your code in all cases.)  I'd be happier if there were a 
clearer term than "calls" for the things being scheduled ("submissions"?), 
since the done callbacks aren't called in the subprocess for 
ProcessPoolExecutor, as we just discussed.

>> Sure.  Really, almost any contract would work, it just needs to be spelled 
>> out.  It might be nice to know whether the thread invoking the callbacks is 
>> a daemon thread or not, but I suppose it's not strictly necessary.
> 
> Your concerns is that the thread will be killed when the interpreter exits? 
> It won't be.

Good to know.  Tell it to the PEP though, not me ;).

>> No reaction on [invoker vs. future]?  I think you'll wish you did this in a 
>> couple of years when you start bumping into application code that calls 
>> "set_result" :).
> 
> My reactions are mixed ;-)

Well, you are not obliged to take my advice, as long as I am not obliged to 
refrain from mocking you mercilessly if it happens that I was right in a couple 
of years ;-).

> Your proposal is to add a level of indirection to make it harder for people 
> to call implementation methods. The downside is that it makes it a bit harder 
> to write tests and Executors.

Both tests and executors will still create and invoke methods directly on one 
object; the only additional difficulty seems to be the need to type '.future' 
every so often on the executor/testing side of things, and that seems a cost 
well worth paying to avoid confusion over who is allowed to call those methods 
and when.

> I also can't see a big problem in letting people call set_result in client 
> code though it is documented as being only for Executor implementations and 
> tests. 
> 
> On the implementation side, I don't see why an Invoker needs a reference to 
> the future.

Well, uh...

> class Invoker(object):
>   def __init__(self):
> """Should only be called by Executor implementations."""
> self.future = Future()
 ^ this is what I'd call a "reference to the future"

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz
On May 26, 2010, at 3:37 AM, Paul Moore wrote:

> On 26 May 2010 08:11, Lennart Regebro  wrote:
>> On Wed, May 26, 2010 at 06:22, Nick Coghlan  wrote:
>>> - download a futures module from PyPI and live with the additional
>>> dependency
>> 
>> Why would that be a problem?
> 
> That has been hashed out repeatedly on this and other lists. Can it
> please be stipulated that for *some* people, in *some* cases, it is a
> problem?

Sure, but I for one fully support Lennart asking the question, because while in 
the short term this *is* a problem with packaging tools in the Python 
ecosystem, in the long term (as you do note) it's an organizational dysfunction 
that can be addressed with better tools.

I think it would be bad to ever concede the point that sane factoring of 
dependencies and code re-use aren't worth it because some jerk in Accounting or 
System Operations wants you to fill out a requisition form for a software 
component that's free and liberally licensed anyway.

To support the unfortunate reality that such jerks in such departments really 
do in fact exist, there should be simple tools to glom a set of small, nicely 
factored dependencies into a giant monolithic ball of crud that installs all at 
once, and slap a sticker on the side of it that says "I am only filling out 
your stupid form once, okay".  This should be as distant as possible from the 
actual decision to package things in sensibly-sized chunks.

In other words, while I kinda-sorta buy Brian's argument that having this 
module in easy reach will motivate more people to use a standard, tested idiom 
for parallelization, I *don't* think that the stdlib should be expanded simply 
to accommodate those who just don't want to install additional packages for 
anything.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3148 ready for pronouncement

2010-05-26 Thread Glyph Lefkowitz

On May 26, 2010, at 4:55 AM, Brian Quinlan wrote:

> I said exactly the opposite of what I meant: futures don't need a reference 
> to the invoker.

Indeed they don't, and they really shouldn't have one.  If I wrote that they 
did, then it was an error.

... and that appears to be it!  Thank you for your very gracious handling of a 
pretty huge pile of criticism :).

Good luck with the PEP,

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-19 Thread Glyph Lefkowitz
On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote:

> HoweverI have very little experience with IRC and consequently have little 
> idea what getting a permanent, owned, channel like #python entails. Hence the 
> '?' that follows.
> 
> What do others think?

Sure, this is a good idea.

Technically speaking, this is extremely easy.  Somebody needs to "/msg chanserv 
register #python3" and that's about it.  (In this case, that "someone" may need 
to be Brett Cannon, since he is the official group contact for Freenode 
regarding Python-related channels.)

Practically speaking, you will need a group of at least a dozen contributors, 
each in a different timezone, who sit there all day answering questions :).  
Otherwise the ownership of the channel is just a signpost pointing at an empty 
room.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-19 Thread Glyph Lefkowitz
On Jun 19, 2010, at 5:39 PM, geremy condra wrote:

> Bottom line, what I'd really like to do is kick them all off of #python, but
> practically I see very little that can be done to rectify the situation at 
> this
> point.

Here's something you can do: port libraries to python 3 and make the ecosystem 
viable.

It's as simple as that.  Nobody on #python has an ideological axe to grind, 
they just want to tell users to use tools which actually solve their problems.  
(Well, unless you think that "helping users" is ideological axe-grinding, in 
which case I think you may want to re-examine your own premises.)

If Python 3 had all the features and libraries as Python 2, and ran in all the 
same places (for example, as Stephen Thorne reminded me when I asked him about 
this, the oldest supported version of Red Hat Enterprise Linux...) then it 
would be an equally viable answer on IRC.  It's going to take a lot of work to 
get it to that point.

Even if you write code, of course, it's too much work for one person to fill 
the whole gap.  Have some patience.  The PSF is funding these efforts, and more 
library authors are porting all the time.  Eventually, resistance in forums 
like Freenode's #python will disappear.  But you can't make it go away by 
wishing it away, you have to get rid of the cause.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-21 Thread Glyph Lefkowitz
On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote:

> One issue I remember from my "enterprise" days is some of the Asian-language 
> developers at NTT/Verio explaining to me that unicode doesn't actually solve 
> certain issues -- that there are use cases where you really *do* need "bytes 
> plus encoding" in order to properly express something.

The thing that I have heard in passing from a couple of folks with experience 
in this area is that some older software in asia would present characters 
differently if they were originally encoded in a "japanese" encoding versus a 
"chinese" encoding, even though they were really "the same" characters.

I do know that Han Unification is a giant political mess 
( makes for some interesting 
reading), but my understanding is that it has handled enough of the cases by 
now that one can write software to display asian languages and it will 
basically work with a modern version of unicode.  (And of course, there's 
always the private use area, as Stephen Turnbull pointed out.)

Regardless, this is another example where keeping around a string isn't really 
enough.  If you need to display a japanese character in a distinct way because 
you are operating in the japanese *script*, you need a tag surrounding your 
data that is a hint to its presentation.  The fact that these presentation 
hints were sometimes determined by their encoding is an unfortunate historical 
accident.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz
On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote:

> The RFC says that URIs are text, and therefore they can (and IMO
> should) be operated on as text in the stdlib.


No, *blue* is the best color for a shed.

Oops, wait, let me try that again.

While I broadly agree with this statement, it is really an oversimplification.  
An URI is a structured object, with many different parts, which are transformed 
from bytes to ASCII (or something latin1-ish, which is really just bytes with a 
nice face on them) to real, honest-to-goodness text via the IRI specification: 
.

> Note also that the "complete solution" argument cuts both ways.  Eg, a
> "complete" solution should implement UTS 39 "confusables detection"[1]
> and IDNA[2].  Good luck doing that with bytes!

And good luck doing that with just characters, too.  You need a parsed 
representation of the URI that you can encode different parts of in different 
ways.  (My understanding is that you should only really implement confusables 
detection in the netloc... while that may be a bogus example, you're certainly 
only supposed to do IDNA in the netloc!)

You can just call urlsplit() all over the place to emulate this, but this does 
not give you the ability to go back to the original bytes, and thereby preserve 
things like brokenly-encoded segments, which seems to be what a lot of this 
hand-wringing is about.

To put it another way, there is no possible information-preserving string or 
bytes type that will make everyone happy as a result from urljoin().  The only 
return-type that gives you *everything* is "URI".

> just using 'latin-1' as the encoding allows you to
> use the (unicode) string operations internally, and then spew your
> mess out into the world for someone else to clean up, just as using
> bytes would.

This is the limitation that everyone seems to keep dancing around.  If you are 
using the stdlib, with functions that operate on sequences like 'str' or 
'bytes', you need to choose from one of three options:

  1. "decode" everything to latin1 (although I prefer to call it "charmap" when 
used in this way) so that you can have some mojibake that will fool a function 
that needs a unicode object, but not lose any information about your input so 
that it can be transformed back into exact bytes (and be very careful to never 
pass it somewhere that it will interact with real text!),
  2. actually decode things to an appropriate encoding to be displayed to the 
user and manipulated with proper text-manipulation tools, and throw away 
information about the bytes,
  3. keep both the bytes and the characters together (perhaps in a data 
structure) so that you can both display the data and encode it in 
situationally-appropriate ways.

The stdlib as it is today is not going to handle the 3rd case for anyone.  I 
think that's fine; it is not the stdlib's job to solve everyone's problems.  
I've been happy with it providing correctly-functioning pieces that can be used 
to build more elaborate solutions.  This is what I meant when I said I agree 
with Stephen's first point: the stdlib *should* just keep operating entirely on 
strings, because URIs are defined, by the spec, to be sequences of ASCII 
characters.  But that's not the whole story.

PJE's "bstr" and "ebytes" proposals set my teeth on edge.  I can totally 
understand the motivation for them, but I think it would be a big step 
backwards for python 3 to succumb to that temptation, even in the form of a 
third-party library.  It is really trying to cram more information into a pile 
of bytes than truly exists there.  (Also, if we're going to have encodings 
attached to bytes objects, I would very much like to add "JPEG" and "FLAC" to 
the list of possibilities.)

The real tension there is that WSGI is desperately trying to avoid defining any 
data structures (i.e. classes), while still trying to work with structured 
data.  An URI class with a 'child' method could handily solve this problem.  
You could happily call IRI(...).join(some bytes).join(some text) and then just 
say "give me some bytes, it's time to put this on the network", or "give me 
some characters, I have to show something to the user", or even "give me some 
characters appropriate for an 'href=' target in some HTML I'm generating" - 
although that last one could be left to the HTML generator, provided it could 
get enough information from the URI/IRI object's various parts itself.

I don't mean to pick on WSGI, either.  This is a common pain-point for porting 
software to 3.x - you had a string, it kinda worked most of the time before, 
but now you need to keep track of text too and the functions which seemed to 
work on bytes no longer do.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz

On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote:

> On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger
>  wrote:
>> 
>> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
>> 
>> This is a common pain-point for porting software to 3.x - you had a
>> string, it kinda worked most of the time before, but now you need to keep
>> track of text too and the functions which seemed to work on bytes no longer
>> do.
>> 
>> Thanks Glyph.  That is a nice summary of one kind of challenge facing
>> programmers.
> 
> Ironically, Glyph also described the pain in 2.x: it only "kinda" worked.

It was not my intention to be ironic about it - that was exactly what I meant 
:).  3.x is forcing you to confront an issue that you _should_ have confronted 
for 2.x anyway. 

(And, I hope, most libraries doing a 3.x migration will take the opportunity to 
make their 2.x APIs unicode-clean while still in 2to3 mode, and jump ship to 
3.x source only _after_ there's a nice transition path for their clients that 
can be taken in 2 steps.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz

On Jun 22, 2010, at 2:07 PM, James Y Knight wrote:

> Yeah. This is a real issue I have with the direction Python3 went: it pushes 
> you into decoding everything to unicode early, even when you don't care -- 
> all you really wanted to do is pass it from one API to another, with some 
> well-defined transformations, which don't actually depend on it having being 
> decoded properly. (For example, extracting the path from the URL and 
> attempting to open it as a file on the filesystem.)

But you _do_ need to decode it in this case.  If you got your URL from some 
funky UTF-32 datasource, b"\x00\x00\x00/" is not a path separator, "/" is.  
Plus, you should really be separating path segments and looking at them 
individually so that you don't fall victim to "%2F" bugs.  And if you want your 
code to be portable, you need a Unicode representation of your pathname anyway 
for Windows; plus, there, you need to care about "\" as well as "/".

The fact that your wire-bytes were probably ASCII(-ish) and your filesystem 
probably encodes pathnames as UTF-8 and so everything looks like it lines up is 
no excuse not to be explicit about your expectations there.

You may want to transcode your characters into some other characters later, but 
that shouldn't stop you from treating them as characters of some variety in the 
meanwhile.

> The surrogateescape method is a nice workaround for this, but I can't help 
> thinking that it might've been better to just treat stuff as 
> possibly-invalid-but-probably-utf8 byte-strings from input, through 
> processing, to output. It seems kinda too late for that, though: next time 
> someone designs a language, they can try that. :)

I can think of lots of optimizations that might be interesting for Python (or 
perhaps some other runtime less concerned with cleverness overload, like PyPy) 
to implement, like a UTF-8 combining-characters overlay that would allow for 
fast indexing, lazily populated as random access dictates.  But this could all 
be implemented as smartness inside .encode() and .decode() and the str and 
bytes types without changing the way the API works.

I realize that there are implications at the C level, but as long as you can 
squeeze a function call in to certain places, it could still work.

I can also appreciate what's been said in this thread a bunch of times: to my 
knowledge, nobody has actually shown a profile of an application where encoding 
is significant overhead.  I believe that encoding _will_ be a significant 
overhead for some applications (and actually I think it will be very 
significant for some applications that I work on), but optimizations should 
really be implemented once that's been demonstrated, so that there's a better 
understanding of what the overhead is, exactly.  Is memory a big deal?  Is CPU? 
 Is it both?  Do you want to tune for the tradeoff?  etc, etc.  Clever 
data-structures seem premature until someone has a good idea of all those 
things.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Glyph Lefkowitz

On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote:

> This is a place where bytes+encoding might also have some benefit.  XML is 
> someplace where you might load a bunch of data but only touch a little bit of 
> it, and the amount of data is frequently large enough that the efficiencies 
> are important.

Different encodings have different characteristics, though, which makes them 
amenable to different types of optimizations.  If you've got an ASCII string or 
a latin1 string, the optimizations of unicode are pretty obvious; if you've got 
one in UTF-16 with no multi-code-unit sequences, you could also hypothetically 
cheat for a while if you're on a UCS4 build of Python.

I suspect the practical problem here is that there's no CharacterString ABC in 
the collections module for third-party libraries to provide their own 
peculiarly-optimized implementations that could lazily turn into real 'str's as 
needed.  I'd volunteer to write a PEP if I thought I could actually get it done 
:-\.  If someone else wants to be the primary author though, I'll try to help 
out.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Glyph Lefkowitz

On Jun 22, 2010, at 8:57 PM, Robert Collins wrote:

> bzr has a cache of decoded strings in it precisely because decode is
> slow. We accept slowness encoding to the users locale because thats
> typically much less data to examine than we've examined while
> generating the commit/diff/whatever. We also face memory pressure on a
> regular basis, and that has been, at least partly, due to UCS4 - our
> translation cache helps there because we have less duplicate UCS4
> strings.

Thanks for setting the record straight - apologies if I missed this earlier in 
the thread.  It does seem vaguely familiar.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Glyph Lefkowitz

On Jun 23, 2010, at 8:17 AM, Steve Holden wrote:

> Guido van Rossum wrote:
>> On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver  wrote:
>>> Any "turdiness" (which I am *not* arguing for) is a natural consequence
>>> of the kinds of backward incompatibilities which were *not* ruled out
>>> for Python 3, along with the (early, now waning) "build it and they will
>>> come" optimism about adoption rates.
>> 
>> FWIW, my optimisim is *not* waning. I think it's good that we're
>> having this discussion and I expect something useful will come out of
>> it; I also expect in general that the (admittedly serious) problem of
>> having to port all dependencies will be solved in the next few years.
>> Not by magic, but because many people are taking small steps in the
>> right direction, and there will be light eventually. In the mean time
>> I don't blame anyone for sticking with 2.x or being too busy to help
>> port stuff to 3.x. Python 3 has been a long time in the making -- it
>> will be a bit longer still, which was expected.
>> 
> +1
> 
> The important thing is to avoid bigotry and FUD, and deal with things
> the way they are. The #python IRC team have just helped us make a major
> step forward. This won't be a campaign with a victorious charge over
> some imaginary finish line.

For sure.

I don't speak for Tres, but I don't think he wasn't talking about optimism 
about *adoption*, overall, but optimism about adoption *rates*.  And I don't 
think he was talking about it coming from Guido :).

There has definitely been some "irrational exuberance" from some quarters.  The 
form it usually takes is someone making a blog post which assumes, because the 
author could port their smallish library or application without too much 
hassle, that Python 2.x is already dead and everyone should be off of it in a 
couple of weeks.

I've never heard this position from the core team or any official communication 
or documentation.  Far from it: the realistic attitude that the Python 3 
migration is something that will take a while has significantly reduced my own 
concerns.

Even the aforementioned blog posts have been encouraging in some ways, because 
a lot of people are reporting surprisingly easy transitions.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Glyph Lefkowitz

On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:

> Regarding the proposal of a String ABC, I hope this isn't going to
> become a backdoor to reintroduce the Python 2 madness of allowing
> equivalency between text and bytes for *some* strings of bytes and not
> others.

For my part, what I want out of a string ABC is simply the ability to do 
application-specific optimizations.

There are many applications where all input and output is text, but _must_ be 
UTF-8.  Even GTK uses UTF-8 as its native text representation, so "output" 
could just be display.

Right now, in Python 3, the only way to be "correct" about this is to copy 
every byte of input into 4 bytes of output, then copy each code point *back* 
into a single byte of output.  If all your application does is rewrite the 
occasional XML attribute, for example, this cost can be significant, if not 
overwhelming.

I'd like a version of 'decode' which would give me a type that was, in every 
respect, unicode, and responded to all protocols exactly as other unicode 
objects (or "str objects", if you prefer py3 nomenclature ;-)) do, but wouldn't 
actually copy any of that memory unless it really needed to (for example, to 
pass to a C API that expected native wide characters), and that would hold on 
to the original bytes so that it could produce them on demand if encoded to the 
same encoding again. So, as others in this thread have mentioned, the 'ABC' 
really implies some stuff about C APIs as well.

I'm not sure about the exact performance impact of such a class, which is why 
I'd like the ability to implement it *outside* of the stdlib and see how it 
works on a project, and return with a proposal along with some data.  There are 
also different ways to implement this, and other optimizations (like ropes) 
which might be better.

You can almost do this today, but the lack of things like the hypothetical 
"__rcontains__" does make it impossible to be totally transparent about it.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-25 Thread Glyph Lefkowitz

On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote:

> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried about space consumption.

So, yes, I am mainly worried about memory consumption, but don't underestimate 
the pure CPU cost of doing all the copying.  It's quite a bit faster to simply 
scan through a string than to scan and while you're scanning, keep faulting out 
the L2 cache while you're accessing some other area of memory to store the copy.

Plus, If I am decoding with the surrogateescape error handler (or its effective 
equivalent), then no, I don't need to validate it in advance; interpretation 
can be done lazily as necessary.  I realize that this is just GIGO, but I 
wouldn't be doing this on data that didn't have an explicitly declared or 
required encoding in the first place.

> I'd like to see a lot of hard memory profiling data before I got overly 
> worried about that.


I know of several Python applications that are already constrained by memory.  
I don't have a lot of hard memory profiling data, but in an environment where 
you're spawning as many processes as you can in order to consume _all_ the 
physically available RAM for string processing, it stands to reason that 
properly decoding everything and thereby exploding everything out into 4x as 
much data (or 2x, if you're lucky) would result in a commensurate decrease in 
throughput.

I don't think I could even reasonably _propose_ that such a project stop 
treating textual data as bytes, because there's no optimization strategy once 
that sort of architecture has been put into place. If your function says "this 
takes unicode", then you just have to bite the bullet and decode it, or rewrite 
it again to have a different requirement.

So, right now, I don't know where I'd get the data with to make the argument in 
the first place :).  If there were some abstraction in the core's treatment of 
strings, though, and I could decode things and note their encoding without 
immediately paying this cost (or alternately, paying the cost to see if it's so 
bad, but with the option of managing it or optimizing it separately).  This is 
why I'm asking for a way for me to implement my own string type, and not for a 
change of behavior or an optimization in the stdlib itself: I could be wrong, I 
don't have a particularly high level of certainty in my performance estimates, 
but I think that my concerns are realistic enough that I don't want to embark 
on a big re-architecture of text-handling only to have it become a performance 
nightmare that needs to be reverted.

As Robert Collins pointed out, they already have performance issues related to 
encoding in Bazaar.  I know they've done a lot of profiling in that area, so I 
hope eventually someone from that project will show up with some data to 
demonstrate it :).  And I've definitely heard many, many anecdotes (some of 
them in this thread) about people distorting their data structures in various 
ways to avoid paying decoding cost in the ASCII/latin1 case, whether it's 
*actually* a significant performance issue or not.  I would very much like to 
tell those people "Just call .decode(), and if it turns out to actually be a 
performance issue, you can always deal with it later, with a custom string 
type."  I'm confident that in *most* cases, it would not be.

Anyway, this may be a serious issue, but I increasingly feel like I'm veering 
into python-ideas territory, so perhaps I'll just have to burn this bridge when 
I come to it.  Hopefully after the moratorium.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can Python implementations reject semantically invalid expressions?

2010-07-01 Thread Glyph Lefkowitz
On Jul 2, 2010, at 12:28 AM, Steven D'Aprano wrote:

> This question was inspired by something asked on #python today. Consider 
> it a hypothetical, not a serious proposal.
> 
> We know that many semantic errors in Python lead to runtime errors, e.g. 
> 1 + "1". If an implementation rejected them at compile time, would it 
> still be Python? E.g. if the keyhole optimizer raised SyntaxError (or 
> some other exception) on seeing this:
> 
> def f():
>return 1 + "1"
> 
> instead of compiling something which can't fail to raise an exception, 
> would that still be a legal Python implementation?

I'd say "no".  Python has defined semantics in this situation: a TypeError is 
raised.

To me, this seems akin to a keyhole optimizer arbitrarily deciding that

raise TypeError()

should cause the compiler to abort.

If this type of expression were common, it would be within the rights of, for 
example, a Python JIT to generate a fast path through 'f' that wouldn't bother 
to actually invoke its 'int' type's '__add__' method, since there is no 
possible way for a Python program to tell the difference, since int.__add__ is 
immutable.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread Glyph Lefkowitz

On Jul 6, 2010, at 8:09 AM, Steven D'Aprano wrote:

> You've never used Apple's much-missed Hypertalk, have you? :)

on mailingListMessage
get the message
put it into aMessage
if the thread of aMessage contains license wankery
put aMessage into the trash
end if
end mailingListMessage

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz

On Jul 11, 2010, at 10:22 AM, Tal Einat wrote:

> Most of the responses up to this point have been strongly against my
> proposal. The main reason given is that it is nice to have a graphical
> IDE supported out-of-the-box with almost any Python installation. This
> is especially important for novice programmers and in teaching
> environments. I understand this sentiment, but I think that supplying
> a quirky IDE with many caveats, lacking documentation, some bugs and a
> partially working debugger ends up causing more confusion than good.

The people who are actually *in* those environments seem to disagree with you 
:).  I think you underestimate the difficulty of getting software installed and 
overestimate the demands of new Python users and students.

While I don't ever use IDLE if there's an alternative available, I have been 
very grateful many times for its presence in environments where it was a 
struggle even to say "install Python".  A workable editor and graphical shell 
is important, whatever its flaws.  (And I think you exaggerate IDLE's flaws 
just a bit.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz

On Jul 11, 2010, at 2:37 PM, Martin v. Löwis wrote:

>> Initially (five years ago!) I tried to overcome these issues by
>> improving IDLE, solving problems and adding a few key features.
>> Without going into details, suffice to say that IDLE hasn't improved
>> much since 2005 despite my efforts. For example, see
>> http://bugs.python.org/issue1529142, where it took nearly 3 years to
>> fix a major issue from the moment I posted the first workaround. For
>> another example, see http://bugs.python.org/issue3068, where I posted
>> a patch for an extension configuration dialog over two years ago, and
>> it hasn't received as much as a sneeze in response.
> 
> I can understand that this is frustrating, but please understand that
> this is not specific to your patches, or to IDLE. Many other patches on
> bugs.python.org remain unreviewed for many years. That's because many of
> the issues are really tricky, and there are very few people who both
> have the time and the expertise to evaluate them.

This problem seems to me to be the root cause here.

Guido proposes to give someone interested in IDLE commit access, and hopefully 
that will help in this particular area.  But, as I recall, at the last language 
summit there was quite a bit of discussion about how to address the broader 
issue of patches falling into a black hole.  Is anybody working on it?

(This seems to me like an area where a judicious application of PSF funds might 
help; if every single bug were actively triaged and responded to, even if it 
weren't reviewed, and patch contributors were directed to take specific steps 
to elicit a response or a review, the fact that patch reviews take a while 
might not be so bad.)

> FWIW, I don't consider a few months as a "long" time for a patch review.

It may not be a long time compared to other patch reviews, but it is a very 
long time for a volunteer to wait for something, especially if that "something" 
is "any indication that the python developers care that this patch was 
submitted at all".

There seems to be at least one thread a month on this list from a disgruntled 
community member complaining (directly or indirectly) about this delay.  I 
think that makes it a big problem.

> At the moment, I'm personally able to perhaps review one issue per week
> (sometimes less); at this rate, it'll take several years until I get
> to everything.


I guess it depends what you mean by "everything", but given that the open bug 
count is actually increasing at a significant rate, I would say that you can 
never possibly get to "everything".

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz
On Jul 11, 2010, at 3:19 PM, Martin v. Löwis wrote:

> Unfortunately, it's often not clear what the submitter wants: does she
> want to help, or want to get help? For a bug report, I often post a
> message "can you provide a patch?", but sometimes, it isn't that clear.

Perhaps this is the one area where the biggest advance could be made: a 
clarification of the workflow.

My experience with Python issues which have been "triaged" is that everyone who 
triages tickets has a slightly different idea of who is responsible for the 
ticket and what they're supposed to do next at every point in the process.  
Triage, as described on , emphasizes 
making sure "that all fields in the issue tracker are properly set", rather 
than on communicating with the contributor or reporter.

On Twisted, we try to encourage triagers to focus on communicating the workflow 
ramifications of what a particular contributor has done.  We try to provide a 
response to the bug reporter or patch submitter that says "thanks, but in order 
to move this along, you need to go through the following steps" and sometimes 
even attach a link to the workflow document pointing out exactly where in the 
process the ticket is now stuck.  (At least, that's what we're trying to do.)

This involves a lot of repeating ourselves in ticket comments, but it's well 
worth it (and as more of the repetition moves into citing links to documents 
that have been written to describe aspects of the workflow, it's less onerous).

 describes what the steps are, but it's in 
a sort of procedural passive voice that doesn't say who is responsible for 
doing reviews or how to get a list of patches which need to be reviewed or what 
exactly a third-party non-core-committer reviewer should do to remove the 
'Patch review' keyword.

 and 
 meander around a bit, but a 
while ago we re-worked them so that each section has a specific audience 
(authors, reviewers, or external patch submitters) and that helped readers 
understand what they're intended to do.

Plus,  is a useful resource for core 
developers with only a little bit of free time to do a review.

(I'm just offering some suggestions based on what I think has worked, not to 
hold Twisted up as a paragon of a perfect streamlined process.  We still have 
folks complain about stuck patches, these documents are _far_ from perfect, and 
there are still some varying opinions about how certain workflow problems 
should be dealt with and differences in quality of review.  Plus, we have far 
fewer patches to deal with than Python.  Nevertheless, the situation used to be 
worse for us, and these measures seem to have helped.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing IDLE from the standard library

2010-07-11 Thread Glyph Lefkowitz
On Jul 11, 2010, at 5:33 PM, Georg Brandl wrote:

> Honestly, how would you feel as a committer to have scores of issues assigned
> to you -- as a consequence of speedy triage -- knowing that you have to invest
> potentially hours of volunteer time into them, while the person doing the
> triaging is done with the bug in a few minutes and paid for it?  I'd feel a
> little bit duped.

That doesn't strike me as a particularly useful type of triage.

The most useful type of triage in this case would be the kind where the bug 
gets re-assigned to the *original contributor*, not a core committer, with a 
message clearly saying "thanks!  but we will not do anything further with this 
ticket until *you* do XYZ."  This may result in some tickets getting left by 
wayside, but at least it will be clear that they have been left by the wayside, 
and whose responsibility they really are.

Even so, I would certainly feel better having scores of issues assigned to me 
than I would feel having scores of issues that are just hanging out in limbo 
forever.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz

On Jul 12, 2010, at 4:34 AM, Éric Araujo wrote:

>> Plus,  is a useful resource
>> for core developers with only a little bit of free time to do a
>> review.
> 
> Title: “Review Tickets, By Order You Should Review Them In”
> I haven’t found a description of this order, can you explain? Thanks.

Part of the reason that the report is worded that way is that we may decide 
that the order should be different, but it will still be the order that you 
should review them in :).

Right now the order is "amount of time since last change, sorted from highest 
to lowest".  In other words, first come, first serve, by last activity.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz

On Jul 12, 2010, at 11:36 AM, Reid Kleckner wrote:

> (Somwhat off-topic):  Another pain point students had was accidentally
> shadowing stdlib modules, like random.  Renaming the file didn't solve
> the problem either, because it left behind .pycs, which I had to help
> them delete.

I feel your pain.  It seems like every third person who starts playing with 
Twisted starts off by making a file called 'twisted.py' and then getting really 
confused by the behavior.  I would love it if this could be fixed, but I 
haven't yet thought of a solution that would be less confusing than the problem 
itself.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Glyph Lefkowitz

On Jul 12, 2010, at 5:47 PM, Fred Drake wrote:

> On Mon, Jul 12, 2010 at 5:42 PM, Michael Foord
>  wrote:
>> I'm sure Brett will love this idea, but if it was impossible to reimport the
>> script being executed as __main__ with a different name it would solve these
>> problems.
> 
> Indeed!  And I'd be quite content with such a solution, since I
> consider scripts and modules to be distinct.

but ... isn't the whole point of 'python -m' to make scripts and modules _not_ 
be distinct?___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] avoiding accidental shadowing of top-level libraries by the main module

2010-07-13 Thread Glyph Lefkowitz
On Jul 13, 2010, at 5:02 PM, Nick Coghlan wrote:

> My concerns aren't about a module reimporting itself directly, they're
> about the case where a utility module is invoked as __main__ but is
> also imported normally somewhere else in a program (e.g. pdb is
> invoked as a top-level debugger, but is also imported directly for
> some reason). Currently that works as a non-circular import and will
> only cause hassles if there is top-level state in the affected module
> that absolutely must be a singleton within a given application. Either
> change (disallowing it completely as you suggest, or making it a
> circular import, as I suggest) runs the risk of breaking code that
> currently appears to work correctly.
> 
> Fred's point about the practice of changing __name__ in the main
> module corrupting generated pickles is one I hadn't thought of before
> though.

It's not just pickle; anything that requires __name__ (or __module__) to be 
accurate for introspection or debugging is also problematic.

I have long considered it a 'best practice' (ugh, I hate that phrase, but I 
can't think of what else to call it) to _always_ do this type of shadowing, and 
avoid defining _any_ names in the __name__ == '__main__' case, so that there's 
no ambiguity:



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What to do with languishing patches?

2010-07-18 Thread Glyph Lefkowitz

On Jul 18, 2010, at 1:46 PM, Alexander Belopolsky wrote:

> We already have "posponed" and "remind" resolutions, but these are
> exclusive of "accepted".   I think there should be a clear way to mark
> the issue "accepted and would be applied if X.Y was out already."
> Chances are one of the resolution labels already has such meaning, but
> in this case it should be more prominently documented as such.

This is what branches are for.

When the X.Y release cycle starts, there should be a branch for X.Y.  Any 
"would be applied" patches can simply be applied to trunk without interrupting 
anything; the X.Y release branch can be merged back into trunk as necessary.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python signal processing question

2010-07-21 Thread Glyph Lefkowitz
On Jul 22, 2010, at 12:00 AM, Stephen J. Turnbull wrote:

> My understanding of OSError is that the OS is saying "sorry, what you
> tried to do is perfectly reasonable under some circumstances, but you
> can't do that now."  ENOMEM, EPERM, ENOENT etc fit this model.
> 
> RuntimeError OTOH is basically saying "You should know better than to
> try that!"  EINVAL fits this model.


That is not my understanding of OSError at all, especially given that I have 
seen plenty of OSErrors that have EINVAL set by various things.

OSError's docstring specifically says "OS system call failed.", and that's the 
way I've already understood it: you made a syscall and got some kind of error.  
Python _mostly_ avoids classifying OSErrors into different exception types in 
other APIs.

The selection of RuntimeError in this particular case seems somewhat random and 
ad-hoc, given that out-of-range signal values give ValueError while SIGKILL and 
SIGSTOP give RuntimeError.  The RuntimeError's args start with "22" (which I 
assume is supposed to mean "EINVAL") but it doesn't have an 'errno' attribute 
as an OSError would.  The ValueError doesn't relate to an errno at all.  
Nowhere does the documentation say "raises OSError or ValueError or TypeError 
or RuntimeError whose args[0] may be an errno".

To be clear, this particular area doesn't bother me.  I've been dealing with 
weird and puzzling signal-handling issues in Python for years and years and 
this dusty corner of the code has never come up.  I did want to reply to this 
particular message, though, because I *would* eventually like the exception 
hierarchy raised by certain stdlib functions to be more thoroughly documented 
and coherent, but a prerequisite to that is to avoid rationalizing the random 
potpourri of exception types that certain parts of the stdlib emit.  I think 
signal.signal is one such part.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] proto-pep: plugin proposal (for unittest)

2010-08-01 Thread Glyph Lefkowitz

On Aug 1, 2010, at 3:52 PM, Ronald Oussoren wrote:

> 
> On 1 Aug, 2010, at 17:22, Éric Araujo wrote:
> 
>>> Speaking of which... Your documentation says it's named ~/unittest.cfg,
>>> could you make this a file in the user base (that is, the prefix where
>>> 'setup.py install --user' will install files)?
>> 
>> Putting .pydistutils.cfg .pypirc .unittest2.cfg .idlerc and possibly
>> other in the user home directory (or %APPDATA% on win32 and
>> what-have-you on Mac) is unnecessary clutter. However, $PYTHONUSERBASE
>> is not the right directory for configuration files, as pointed in
>> http://bugs.python.org/issue7175
>> 
>> It would be nice to agree on a ~/.python (resp. %APPADATA%/Python) or
>> $XDG_CONFIG_HOME/python directory and put config files there.
> 
> ~/Library/Python would be a good location on OSX, even if the 100% formally 
> correct location would be ~/Preferences/Python (at least of framework builds, 
> unix-style builds may want to follow the unix convention).

"100% formally" speaking, MacOS behaves like UNIX in many ways.  


It's fine to have a mac-pathname-convention-following place for such data, but 
please _also_ respect the UNIX-y version on the Mac.  The only possible outcome 
of python on the Mac respect only Mac pathnames is to have automation scripts 
that work fine on BSD and Linux, but then break when you try to run them on a 
Mac.  There is really no benefit to intentionally avoiding honoring the UNIX 
conventions.  (For another example, note that although Python resides in 
/System/Library, on the mac, the thing that's in your $PATH when you're using a 
terminal is the symlink in /usr/bin/python.)

Also, no, "~/Preferences" isn't the right place for it either; there's no such 
thing.  You probably meant "~/Library/Preferences".  I'd say that since 
~/Library/Python is already used, there's no particular reason to add a new 
~/Library/Preferences/Python location.  After all, if you really care a lot 
about platform conventions, you should put it in 
~/Library/Preferences/org.python.distutils.plist, but I don't see what benefit 
that extra complexity would have for anyone.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread Glyph Lefkowitz

On Aug 2, 2010, at 9:53 AM, exar...@twistedmatrix.com wrote:

> On 01:27 pm, m...@egenix.com wrote:
>> exar...@twistedmatrix.com wrote:
>>> On 12:21 pm, m...@egenix.com wrote:
 
 See Zope for an example of how well this simply mechanism works out in
 practice: it simply scans the "Products" namespace for sub-packages and
 then loads each sub-package it finds to have it register itself with
 Zope.
>>> 
>>> This is also roughly how Twisted's plugin system works.  One drawback,
>>> though, is that it means potentially executing a large amount of Python
>>> in order to load plugins.  This can build up to a significant
>>> performance issue as more and more plugins are installed.
>> 
>> I'd say that it's up to the application to deal with this problem.
>> 
>> An application which requires lots and lots of plugins could
>> define a registration protocol that does not require loading
>> all plugins at scanning time.
> 
> It's not fixable at the application level, at least in Twisted's plugin 
> system.  It sounds like Zope's system has the same problem, but all I know of 
> that system is what you wrote above.  The cost increases with the number of 
> plugins installed on the system, not the number of plugins the application 
> wants to load.

We do have a plan to address this in Twisted's plugin system (eventually): 
, although I'm not sure if that's 
relevant to the issue at hand.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-03 Thread Glyph Lefkowitz

On Aug 3, 2010, at 4:28 AM, M.-A. Lemburg wrote:

> I don't think that's a problem: the SQLite database would be a cache
> like e.g. a font cache or TCSH command cache, not a replacement of
> the meta files stored in directories.
> 
> Such a database would solve many things at once: faster access to
> the meta-data of installed packages, fewer I/O calls during startup,
> more flexible ways of doing queries on the meta-data, needed for
> introspection and discovery, etc.

This is exactly what Twisted already does with its plugin cache, and the 
previously-cited ticket in this thread should expand the types of metadata 
which can be obtained about plugins.

Packaging systems are perfectly capable of generating and updating such 
metadata caches, but various packages of Twisted (Debian's especially) didn't 
read our documentation and kept moving around the place where Python source 
files were installed, which routinely broke the post-installation hooks and 
caused all kinds of problems.

I would strongly recommend looping in the Python packaging teams from various 
distros *before* adding another such cache, unless you want to be fielding bugs 
from Launchpad.net for five years :).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing #7175: a standard location for Python config files

2010-08-12 Thread Glyph Lefkowitz

On Aug 12, 2010, at 6:30 AM, Tim Golden wrote:

> I don't care how many stats we're doing

You might not, but I certainly do.  And I can guarantee you that the authors of 
command-line tools that have to start up in under ten seconds, for example 
'bzr', care too.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread Glyph Lefkowitz

On Aug 24, 2010, at 8:31 AM, Benjamin Peterson wrote:

> 2010/8/24 Hrvoje Niksic :
>> The __length_hint__ lookup expects either no exception or AttributeError,
>> and will propagate others.  I'm not sure if this is a bug.  On the one hand,
>> throwing anything except AttributeError from __getattr__ is bad style (which
>> is why we fixed the bug by deriving our business exception from
>> AttributeError), but the __length_hint__ check is supposed to be an internal
>> optimization completely invisible to the caller of list().
> 
> __length_hint__ is internal and undocumented, so it can do whatever it wants.

As it happens though, list() is _quite_ public.  Saying "X is internal and 
undocumented, so it can do whatever it wants" is never really realistic, 
especially in response to someone saying "we already saw this problem in 
production, _without_ calling / referring to / knowing about this private API".

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Internal counter to debug leaking file descriptors

2010-08-31 Thread Glyph Lefkowitz

On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote:

> On Linux you can look somewhere in /proc, but I don't know that it
> would help you find where a file was opened.

"/dev/fd" is actually a somewhat portable way of getting this information.  I 
don't think it's part of a standard, but on Linux it's usually a symlink to 
"/proc/self/fd", and it's available on MacOS and most BSDs (based on a hasty 
and completely-not-comprehensive investigation).  But it won't help you find 
out when the FDs were originally opened, no.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Garbage announcement printed on interpreter shutdown

2010-09-10 Thread Glyph Lefkowitz

On Sep 10, 2010, at 5:10 PM, Amaury Forgeot d'Arc wrote:

> 2010/9/10 Fred Drake :
>> On Fri, Sep 10, 2010 at 4:32 PM, Georg Brandl  wrote:
>>> IMO this runs contrary to the decision we made when DeprecationWarnings were
>>> made silent by default: it spews messages not only at developers, but also 
>>> at
>>> users, who don't need it and probably are going to be quite confused by it,
>> 
>> Agreed; this should be silent by default.
> 
> +1. I suggest to enable it only when Py_DEBUG (or Py_TRACE_REFS or
> Py_REF_DEBUG?) is defined.

Would it be possible to treat it the same way as a deprecation warning, and 
show it under the same conditions?  It would be nice to know if my Python 
program is leaking uncollectable objects without rebuilding the interpreter.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz
On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:

> Given a message, there are many times you want to serialize it as text
> (for example, for presentation in a UI).  You could provide alternate
> serialization methods to get text out on demandbut then what if
> someone wants to push that text representation back in to email to
> rebuild a model of the message?

You tell them "too bad, make some bytes out of that text."  Leave it up to the 
application.  Period, the end, it's not the library's job.  If you pushed the 
text out to a 'view message source' UI representation, then the vicissitudes of 
the system clipboard and other encoding and decoding things may corrupt it in 
inscrutable ways.  You can't fix it.  Don't try.

> So now we have both a bytes parser and a string parser.

Why do so many messages on this subject take this for granted?  It's wrong for 
the email module just like it's wrong for every other package.

There are plenty of other (better) ways to deal with this problem.  Let the 
application decide how to fudge the encoding of the characters back into bytes 
that can be parsed.  "In the face of ambiguity, refuse the temptation to guess" 
and all that.  The application has more of an idea of what's going on than the 
library here, so let it make encoding decisions.

Put another way, there's nothing wrong with having a text parser, as long as it 
just encodes the text according to some known encoding and then parses the 
bytes :).


> So, after much discussion, what we arrived at (so far!) is a model
> that mimics the Python3 split between bytes and strings.  If you
> start with bytes input, you end up with a BytesMessage object.
> If you start with string input to the parser, you end up with a
> StringMessage.

That may be a handy way to deal with some grotty internal implementation 
details, but having a 'decode()' method is broken.  The thing I care about, as 
a consumer of this API, is that there is a clearly defined "Message" interface, 
which gives me a uniform-looking place where I can ask for either characters 
(if I'm displaying them to the user) or bytes (if I'm putting them on the 
wire).  I don't particularly care where those bytes came from.  I don't care 
what decoding tricks were necessary to produce the characters.

Now, it may be worthwhile to have specific normalization / debrokenifying 
methods which deal with specific types of corrupt data from the wire; 
encoding-guessing, replacement-character insertion or whatever else are fine 
things to try.  It may also be helpful to keep around a list of errors in the 
message, for inspection.  But as we know, there are lots of ways that MIME data 
can go bad other than encoding, so that's just one variety of error that we 
might want to keep around.

(Looking at later messages as I'm about to post this, I think this all sounds 
pretty similar to Antoine's suggestions, with respect to keeping the 
implementation within a single class, and not having 
BytesMessage/UnicodeMessage at the same abstraction level.)___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

2010-09-16 Thread Glyph Lefkowitz

On Sep 16, 2010, at 7:34 PM, Barry Warsaw wrote:

> On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote:
> 
>> That may be a handy way to deal with some grotty internal
>> implementation details, but having a 'decode()' method is broken.  The
>> thing I care about, as a consumer of this API, is that there is a
>> clearly defined "Message" interface, which gives me a uniform-looking
>> place where I can ask for either characters (if I'm displaying them to
>> the user) or bytes (if I'm putting them on the wire).  I don't
>> particularly care where those bytes came from.  I don't care what
>> decoding tricks were necessary to produce the characters.
> 
> But first you have to get to that Message interface.  This is why the current
> email package separates parsing and generating from the representation model.
> You could conceivably have a parser that rot13's all the payload, or just
> parses the headers and leaves the payload as a blob of bytes.  But the parser
> tries to be lenient in what it accepts, so that one bad header doesn't cause
> it to just punt on everything that follows.  Instead, it parses what it can
> and registers a defect on that header, which the application can then reason
> about, because it has a Message object.  If it were to just throw up its hands
> (i.e. raise an exception), you'd basically be left with a blob of useless crap
> that will just get /dev/null'd.

Oh, absolutely.  Please don't interpret anything I say as meaning that the 
email API should not handle broken data.  I'm just saying that you should not 
expect broken data to round-trip through translation to characters and back, 
any more than you should expect a broken PNG to round-trip through a 
translation to a 2d array of pixels and back.

>> Now, it may be worthwhile to have specific normalization /
>> debrokenifying methods which deal with specific types of corrupt data
>> from the wire; encoding-guessing, replacement-character insertion or
>> whatever else are fine things to try.  It may also be helpful to keep
>> around a list of errors in the message, for inspection.  But as we
>> know, there are lots of ways that MIME data can go bad other than
>> encoding, so that's just one variety of error that we might want to
>> keep around.
> 
> Right.  The middle ground IMO is what the current parser does.  It recognizes
> the problem, registers a defect, and tries to recover, but it doesn't fix the
> corrupt data.  So for example, if you had a valid RFC 2047 encoded Subject but
> a broken X-Foo header, you'd at least still end up with a Message object.  The
> value of the good headers would be things from which you can get the unicode
> value, the raw bytes value, parse its parameters, munge it, etc. while the bad
> header might be something you can only get the raw bytes from.


My take on this would be that you should always be able to get bytes or 
characters, but characters are always suspect, in that once you've decoded, if 
you had invalid bytes, then they're replacement characters (or your choice of 
encoding fix).___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-19 Thread Glyph Lefkowitz
On Sep 18, 2010, at 10:18 PM, Steve Holden wrote:

>> I could probably be persuaded to merge the APIs, but the email6
>> precedent suggests to me that separating the APIs better reflects the
>> mental model we're trying to encourage in programmers manipulating
>> text (i.e. the difference between the raw octet sequence and the text
>> character sequence/parsed data).
>> 
> That sounds pretty sane and coherent to me.

While I don't like the email6 precedent as such (that there would be different 
parsed objects, based on whether you started parsing with bytes or with 
strings), the idea that when you are working directly with bytes or text, you 
should have to know which one you have, is a good one.  +1 for keeping the APIs 
separate with 'urlsplitb' etc.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support for async read/write

2010-10-19 Thread Glyph Lefkowitz

On Oct 19, 2010, at 8:09 PM, James Y Knight wrote:

> There's a difference.
> 
> os._exit is useful. os.open is useful. aio_* are *not* useful. For anything. 
> If there's anything you think you want to use them for, you're wrong. It 
> either won't work properly or it will worse performing than the simpler 
> alternatives.


I'd like to echo this sentiment.  This is not about providing a 'safe' wrapper 
to hide some powerful feature of these APIs: the POSIX aio_* functions are 
really completely useless.

To quote the relevant standard 
:

APPLICATION USAGE

None.

RATIONALE

None.

FUTURE DIRECTIONS

None.

Not only is the performance usually worse than expected, the behavior of aio_* 
functions require all kinds of subtle and mysterious coordination with signal 
handling, which I'm not entirely sure Python would even be able to pull off 
without some modifications to the signal module.  (And, as Jean-Paul mentioned, 
if your OS kernel runs out of space in a queue somewhere, completion 
notifications might just never be delivered at all.)

I would love for someone to prove me wrong.  In particular, I would really love 
for there to be a solution to asynchronous filesystem I/O better than "start a 
thread, read until you block".  But, as far as I know, there isn't, and 
wrapping these functions will just confuse and upset anyone who attempts to use 
them in any way.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support for async read/write

2010-10-20 Thread Glyph Lefkowitz

On Oct 19, 2010, at 9:55 PM, exar...@twistedmatrix.com wrote:

>> Not only is the performance usually worse than expected, the behavior of 
>> aio_* functions require all kinds of subtle and mysterious coordination with 
>> signal handling, which I'm not entirely sure Python would even be able to 
>> pull off without some modifications to the signal module.  (And, as 
>> Jean-Paul mentioned, if your OS kernel runs out of space in a queue 
>> somewhere, completion notifications might just never be delivered at all.)
> 
> Just to be clear, James corrected me there.  I thought Jesus was talking 
> about the mostly useless Linux AIO APIs, which have the problems I described. 
>  He was actually talking about the POSIX AIO APIs, which have a different set 
> of problems making them a waste of time.

I know, I'm referring to the behavior of POSIX AIO.

Perhaps I'm overstating the case with 'subtle and mysterious', then, but the 
POISX 'aiocb' structure still includes an 'aio_sigevent' member which is the 
way to find out about I/O event completion.  If you're writing an application 
that uses AIO, basically all of your logic ends up living in the context of a 
signal handler, and as 

 puts it,

"When signal-catching functions are invoked asynchronously with process 
execution, the behavior of some of the functions defined by this volume of IEEE 
Std 1003.1-2001 is unspecified if they are called from a signal-catching 
function."

Of course, you could try using signalfd(), but that's not in POSIX.

(Or, you could use SIGEV_THREAD, but that would be functionally equivalent to 
running read() in a thread, except much more difficult.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support for async read/write

2010-10-20 Thread Glyph Lefkowitz

On Oct 20, 2010, at 12:31 AM, Jeffrey Yasskin wrote:

> No comment on the rest of your claim, but this is a silly argument.
> The standard says the same thing about at least fcntl.h, signal.h,
> pthread.h, and ucontext.h, which clearly are useful.

It was meant to be tongue-in-cheek :).  Perhaps I should not have assumed that 
everyone else was as familiar with the POSIX documentation; I figured that most 
readers would know that most pages say that.

But, that was the result of a string of many different searches attempting to 
find someone explaining why this was a good idea or why anyone would want to 
use it.  I think in this case, it's accurate.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Continuing 2.x

2010-10-29 Thread Glyph Lefkowitz
On Oct 28, 2010, at 10:51 PM, Brett Cannon wrote:

> I think people need to stop viewing the difference between Python 2.7
> and Python 3.2 as this crazy shift and view it from python-dev's
> perspective; it should be viewed one follows from the other at this
> point. You can view it as Python 3.2 is the next version after Python
> 2.7 just like 2.7 followed to 2.6, which makes the policies we follow
> for releases make total sense and negates this discussion. It just so
> happens people don't plan to switch to the newest release immediately
> as the backward-incompatible changes are more involved than what
> people are used to from past releases.


Brett, with all due respect, this is not a reasonable position.  You are making 
it sound like the popular view of 3.2 is a "crazy shift" is based on a personal 
dislike of python-dev or something.  The fact is that the amount of effort 
required to port to 3.2 is extreme compared to previous upgrades, and most 
people still aren't willing to deal with it.  It is a crazy shift.

Let's take PyPI numbers as a proxy.  There are ~8000 packages with a 
"Programming Language::Python" classifier.  There are ~250 with "Programming 
Langauge::Python::3".  Roughly speaking, we can say that is 3% of Python code 
which has been ported so far.  Python 3.0 was released at the end of 2008, so 
people have had roughly 2 years to port, which comes up with 1.5% per year.

Let's say that 20% of the code on PyPI is just junk; it's unfair to expect 100% 
of all code ever to get ported.  But, still: with this back-of-the-envelope 
estimate of the rate of porting, it will take over 50 years before a decisive 
majority of Python code is on Python 3.

By contrast, there are 536 packages with ::2.6, and 177 with ::2.7.  (Trying to 
compare apples to apples here, since I assume the '2' tag is much more lightly 
used than '3' to identify supported versions; I figure someone likely to tag 
one micro-version would also tag the other.)

2.7 was released on July 3rd, so let's be generous and say approximately 6 
months.  That's 30% of packages, ported in 6 months, or 60% per year.  This 
means that Python 3 is two orders of magnitude crazier of a shift than 2.7.

I know that the methods involved at arriving at these numbers are not 
particularly good. But, I think that if their accuracy differs from that of the 
download stats, it's better: it takes a much more significant commitment to 
actually write some code and upload it than to accidentally download 3.x 
because it's the later version.

Right now, Kristján is burning off his (non-fungible) enthusiasm in this 
discussion rather than addressing more 2.x maintenance issues.  If 3.x adoption 
takes off and makes a nice hockey stick graph, then few people will care about 
this in retrospect.  In the intervening hypothetical half-century while we wait 
to see how it pans out, isn't it better to just have an official Python branch 
for the "maybe 2.8" release?  Nobody from the current core team needs to work 
on it, necessarily; either other, new maintainers will show up or they won't.  
For that matter, Kristján is still talking about porting much of his work to 
3.x anyway.

In the best case (3.x takes over the world in 6 months) a 2.x branch won't be 
needed and nobody will show up to do the work of a release; some small amount 
of this work (the stuff not ported to 3.x) will be lost.  In the medium case 
(3.x adoption is good, but there are still millions of 2.x users in 5 years) it 
will accumulate some helpers that will make migrating to 3.x even smoother than 
with 2.7.  In the worst case (straw man: 3.x adoption actually declines, and 
distros start maintaining their own branches of 2.7) I'm sure everyone will be 
glad that some of this maintenance effort took place and there's some central 
place to continue it.

I'm perfectly willing to admit that I'm still too pessimistic about this and I 
could be wrong.  But given the relatively minimal amount of effort required to 
let 2.x bugs continue to get fixed under the aegis of Python.org rather than 
going through the painful negotiation process of figuring out where else to 
host it (and thereby potentially losing a bunch of maintenance that would not 
otherwise happen), it seems foolhardy to insist that those of us who think 2.x 
is going to necessitate another release must necessarily be wrong.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] closing files and sockets in a timely manner in the stdlib

2010-10-30 Thread Glyph Lefkowitz

On Oct 30, 2010, at 2:39 PM, Jack Diederich wrote:

> On Fri, Oct 29, 2010 at 8:35 PM, Brett Cannon  wrote:
>> For those of you who have not noticed, Antoine committed a patch that
>> raises a ResourceWarning under a pydebug build if a file or socket is
>> closed through garbage collection instead of being explicitly closed.
> 
> Just yesterday I discovered /proc//fd/ which is a list
> of open file descriptors for your PID on *nix and includes all open
> files, pipes, and sockets.  Very handy, I filed some tickets about
> company internal libs that were opening file handles as a side effect
> of import (logging mostly).  I tried to provoke standard python
> imports (non-test) to leave some open handles and came up empty.

That path (and anything below /proc, really) is a list of open file descriptors 
specifically on Linux, not "*nix".  Also on linux, you can avoid "" by just doing "/proc/self".

A more portable (albeit not standard) path for "what file descriptors do I have 
open" is /dev/fd/.  This is supported via a symlink to /proc/self on all the 
Linuxes I've tested on.  There's no portable standard equivalent for 
not-yourself processes that I'm aware of, though.

See more discussion here: .

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/

2010-11-03 Thread Glyph Lefkowitz

On Nov 3, 2010, at 1:04 PM, James Y Knight wrote:

> This is the strongest reason why I recommend to everyone I know that they not 
> use pickle for storage they'd like to keep working after upgrades [not just 
> of stdlib, but other 3rd party software or their own software]. :)

+1.

Twisted actually tried to preserve pickle compatibility in the bad old days, 
but it was impossible.  Pickles should never really be saved to disk unless 
they contain nothing but lists, ints, strings, and dicts.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/

2010-11-03 Thread Glyph Lefkowitz

On Nov 3, 2010, at 11:26 AM, Alexander Belopolsky wrote:

> This may not be a problem for smart tools, but for me and a simple
> editor what used to be:


Maybe this is the real problem?  It's 2010, we should all be far enough beyond 
EDLIN that our editors can jump to the definition of a Python class.  Even Vim 
can be convinced to do this ().  
Could Python itself make this easier?  Maybe ship with a command that says 
"hey, somewhere on sys.path, there is a class with .  Please run 
'$EDITOR file +line' (or the current OS's equivalent) so I can look at the 
source code".


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pickle alternative in stdlib (Was: On breaking modules into packages)

2010-11-04 Thread Glyph Lefkowitz

On Nov 4, 2010, at 12:49 PM, Guido van Rossum wrote:

> What's the attack you're thinking of on marshal? It never executes any
> code while unmarshalling (although it can unmarshal code objects --
> but the receiving program has to do something additionally to execute
> those).

These issues may have been fixed now, but a long time ago I recall seeing some 
nasty segfaults which looked exploitable when feeding marshal malformed data.  
If they still exist, running a fuzzer on some pyc files should reveal them 
pretty quickly.

When I ran across them I didn't think much of them, and probably did not even 
report the bug, since marshal is mostly used to load code anyway, which is 
implicitly trusted.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-08 Thread Glyph Lefkowitz

On Nov 8, 2010, at 2:35 PM, exar...@twistedmatrix.com wrote:

> On 09:57 pm, br...@python.org wrote:
>> On Mon, Nov 8, 2010 at 13:45,   wrote:
>>> On 09:25 pm, br...@python.org wrote:
 
 On Mon, Nov 8, 2010 at 13:03,   wrote:
> 
> On 07:58 pm, br...@python.org wrote:
>>> 
>>> I don't think a strict don't remove without deprecation policy is
>>> workable.  For example, is trace.rx_blank constant part of the trace
>>> module API that needs to be preserved indefinitely?  I don't even know
>>> if it is possible to add a deprecation warning to it, but
>>> CoverageResults._blank_re would certainly be a better place for it.
>> 
>> The deprecation policy obviously cannot apply to module-level
>> attributes.
> 
> I'm not sure why this is.  Can you elaborate?
 
 There is no way to directly trigger a DeprecationWarning for an
 attribute. We can still document it, but there is just no way to
 programmatically enforce it.
>>> 
>>> What about `deprecatedModuleAttribute`
>>> ()
>>> or zope.deprecation
>>> () which inspired it?
>> 
>> Just checked the code and it looks like it substitutes the module for
>> some proxy object? To begin that break subclass checks. After that I
>> don't know the ramifications without really digging into the
>> ModuleType code.
> 
> That could be fixed if ModuleType allowed subclassing. :)
> 
> For what it's worth, no one has complained about problems caused by 
> `deprecatedModuleAttribute`, but we've only been using it for about two and a 
> half years.

This seems like a pretty clear case of "practicality beats purity".  Not only 
has nobody complained about deprecatedModuleAttribute, but there are tons of 
things which show up in sys.modules that aren't modules in the sense of 
'instances of ModuleType'.  The Twisted reactor, for example, is an instance, 
and we've been doing *that* for about 10 years with no complaints.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-09 Thread Glyph Lefkowitz

On Nov 8, 2010, at 4:50 PM, Guido van Rossum wrote:
> On Mon, Nov 8, 2010 at 3:55 PM, Glyph Lefkowitz  
> wrote:
>> This seems like a pretty clear case of "practicality beats purity".  Not 
>> only has nobody complained about deprecatedModuleAttribute, but there are 
>> tons of things which show up in sys.modules that aren't modules in the sense 
>> of 'instances of ModuleType'.  The Twisted reactor, for example, is an 
>> instance, and we've been doing *that* for about 10 years with no complaints.
> 
> But the Twisted universe is only a subset of the Python universe. The
> Python stdlib needs to move more carefully.

While this is true, I think the Twisted universe generally represents a 
particularly conservative, compatibility-conscious area within the Python 
universe (multiverse?).  I know of several Twisted users who regularly upgrade 
to the most recent version of Twisted without incident, but can't move from 
Python 2.4->2.5 because of compatibility issues.

That's not to say that there are no areas within the larger Python ecosystem 
that I'm unaware of where putting non-module-objects into sys.modules would 
cause issues.  But if it were a practice that were at all common, I suspect 
that we would have bumped into it by now.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-10 Thread Glyph Lefkowitz

On Nov 10, 2010, at 2:21 PM, James Y Knight wrote:

> On the other hand, if you make the primary mechanism to indicate privateness 
> be a leading underscore, that's obvious to everyone.

+1.

One of the best features of Python is the ability to make a conscious decision 
to break the interface of a library and just get on with your work, even if 
your use-case is not really supported, because nothing can stop you calling its 
private functionality.

But, IMHO the worst problem with Python is the fact that you can do this 
_without realizing it_ and pay a steep maintenance price later when an upgrade 
of something springs the trap that you had unwittingly set for yourself.

The leading-underscore convention is the only thing I've found that even 
mitigates this problem.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-16 Thread Glyph Lefkowitz

On Nov 16, 2010, at 4:49 PM, Guido van Rossum wrote:

>> PEP 8 isn't nearly visible enough, either.  Whatever the rule is, it needs
>> to be presented with the information itself.  If the rule is that things not
>> documented in the library manual have no compatibility guarantees, then all
>> of the means of getting documentation *other* than looking at the library
>> manual need to indicate this somehow (alternatively, the information
>> shouldn't be duplicated, but I doubt I'll convince anyone of that).
> 
> Assuming people actually read the disclaimers.

I don't think it necessarily needs to be presented as a disclaimer.  There will 
always be people who just ignore part of the information presented, but the 
message could be something along the lines of "Here's some basic documentation, 
but it might be out-of-date or incomplete.  You can find a better reference at 
."  If it's easy to click on the link, I 
think a lot of people will click on it.  Especially since the library reference 
really _is_ more helpful than the docstrings, for the standard library.

(IMHO, dir()'s semantics are so weird that it should emit a warning too, like 
"looking for docs?  please use help()".)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-22 Thread Glyph Lefkowitz
On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto <
ocean-c...@m2.ccsnet.ne.jp> wrote:

> Hello. Does this affect python? Thank you.
>
> http://www.openssl.org/news/secadv_20101116.txt
>

No.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:37 AM, ben.cottr...@nominum.com wrote:

> I'd prefer not to think of the number of times I've made the following 
> mistake:
> 
> s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)

If it's any consolation, it's fewer than the number of times I have :).

(More fun, actually, is where you pass a file descriptor to the wrong argument 
of 'fromfd'...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:

> Well, it is easy to assign range(N) to a tuple of names when desired. I
> don't think an automatically-enumerating constant generator is needed.

I don't think that numerical enumerations are the only kind of constants we're 
talking about.  Others have already mentioned strings.  Also, see 
 for some other use-cases.  Since this isn't coming to 2.x, 
we're probably going to do our own thing anyway (unless it turns out that 
flufl.enum is so great that we want to add another dependency...) but I'm 
hoping that the outcome of this discussion will point to something we can be 
compatible with.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 7:22 PM, James Y Knight wrote:

> On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
>> Maybe Python should have used UTF-8 as its internal unicode
>> representation. Then people who were foolish enough to assume
>> one character per string item would have their programs break
>> rather soon under only light unicode testing. :-)
> 
> You put a smiley, but, in all seriousness, I think that's actually the right 
> thing to do if anyone writes a new programming language. It is clearly the 
> right thing if you don't have to be concerned with backwards-compatibility: 
> nobody really needs to be able to access the Nth codepoint in a string in 
> constant time, so there's not really any point in storing a vector of 
> codepoints.
> 
> Instead, provide bidirectional iterators which can traverse the string by 
> byte, codepoint, or by grapheme (that is: the set of combining characters + 
> base character that go together, making up one thing which a human would 
> think of as a character).


I really hope that this idea is not just for new programming languages.  If you 
switch from doing unicode "wrong" to doing unicode "right" in Python, you 
quadruple the memory footprint of programs which primarily store and manipulate 
large amounts of text.

This is especially ridiculous in PyGTK applications, where the GUI's internal 
representation required by the GUI UTF-8 anyway, so the round-tripping of 
string data back and forth to the exploded UTF-32 representation is wasting 
gobs of memory and time.  It at least makes sense when your C library's idea 
about character width and your Python build match up.

But, in a desktop app this is unlikely to be a performance concern; in servers, 
it's a big deal; measurably so.  I am pretty sure that in the server apps that 
I work on, we are eventually going to need our own string type and UTF-8 logic 
that does exactly what James suggested - certainly if we ever hope to support 
Py3.

(I dimly recall that both James and I have made this point before, but it's 
pretty important, so it bears repeating.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:

> On Tue, 23 Nov 2010 00:07:09 -0500
> Glyph Lefkowitz  wrote:
>> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto <
>> ocean-c...@m2.ccsnet.ne.jp> wrote:
>> 
>>> Hello. Does this affect python? Thank you.
>>> 
>>> http://www.openssl.org/news/secadv_20101116.txt
>>> 
>> 
>> No.
> 
> Well, actually it does, but Python links against the system OpenSSL on
> most platforms (except Windows), so it's up to the OS vendor to apply
> the patch.


It does?  If so, I must have misunderstood the vulnerability.  Can you explain 
how it affects Python?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote:

> James Y Knight writes:
> 
>> You put a smiley, but, in all seriousness, I think that's actually
>> the right thing to do if anyone writes a new programming
>> language. It is clearly the right thing if you don't have to be
>> concerned with backwards-compatibility: nobody really needs to be
>> able to access the Nth codepoint in a string in constant time, so
>> there's not really any point in storing a vector of codepoints.
> 
> A sad commentary on the state of Emacs usage, "nobody".
> 
> The theory is that accessing the first character of a region in a
> string often occurs as a primitive operation in O(N) or worse
> algorithms, sometimes without enough locality at the "collection of
> regions" level to give a reasonably small average access time.

I'm not sure what you mean by "the theory is".  Whose theory?  About what?

> In practice, any *Emacs user can tell you that yes, we do need to be
> able to access the Nth codepoint in a buffer in constant time.  The
> O(N) behavior of current Emacs implementations means that people often
> use a binary coding system on large files.  Yes, some position caching
> is done, but if you have a large file (eg, a mail file) which is
> virtually segmented using pointers to regions, locality gets lost.
> (This is not a design bug, this is a fundamental requirement: consider
> fast switching between threaded view and author-sorted view.)

Sounds like a design bug to me.  Personally, I'd implement "fast switching 
between threaded view and author-sorted view" the same way I'd address any 
other multiple-views-on-the-same-data problem.  I'd retain data structures for 
both, and update them as the underlying model changed.

These representations may need to maintain cursors into the underlying 
character data, if they must retain giant wads of character data as an 
underlying representation (arguably the _main_ design bug in Emacs, that it 
encourages you to do that for everything, rather than imposing a sensible 
structure), but those cursors don't need to be code-point counters; they could 
be byte offsets, or opaque handles whose precise meaning varied with the 
potentially variable underlying storage.

Also, please remember that Emacs couldn't be implemented with giant Python 
strings anyway: crucially, all of this stuff is _mutable_ in Emacs.

> And of course an operation that sorts regions in a buffer using
> character pointers will have the same problem.  Working with memory
> pointers, OTOH, sucks more than that; GNU Emacs recently bit the
> bullet and got rid of their higher-level memory-oriented APIs, all of
> the Lisp structures now work with pointers, and only the very
> low-level structures know about character-to-memory pointer
> translation.
> 
> This performance issue is perceptible even on 3GHz machines with not
> so large (50MB) mbox files.  It's *horrid* if you do something like
> "occur" on a 1GB log file, then try randomly jumping to detected log
> entries.

Case in point: "occur" needs to scan the buffer anyway; you can't do better 
than linear time there.  So you're going to iterate through the buffer, using 
one of the techniques that James proposed, and remember some locations.  Why 
not just have those locations be opaque cursors into your data?

In summary: you're right, in that James missed a spot.  You need bidirectional, 
*copyable* iterators that can traverse the string by byte, codepoint, grapheme, 
or decomposed glyph.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote:

> You end up proliferating types that all do the same kind of thing.  Judicious 
> use of inheritance helps, but getting the fundamental abstraction right is 
> hard.  Or least, Emacs hasn't found it in 20 years of trying.

Emacs hasn't even figured out how to do general purpose iteration in 20 years 
of trying either.  The easiest way I've found to loop across an arbitrary pile 
of 'stuff' is the CL 'loop' macro, which you're not even supposed to use.  Even 
then, you still have to make the arcane and pointless distinction of using 
'across' or 'in' or 'on'.  Python, on the other hand, has iteration pretty well 
tied up nicely in a bow.

I don't know how to respond to the rest of your argument.  Nothing you've said 
has in any way indicated to me why having code-point offsets is a good idea, 
only that people who know C and elisp would rather sling around piles of 
integers than have good abstract types.

For example:

> I think it more likely that markers are very expense to create and use 
> compared to integers.

What?  When you do 'for x in str' in python, you are already creating an 
iterator object, which has to store the exact same amount of state that our 
proposed 'marker' or 'character pointer' would have to store.  The proposed 
UTF-8 marker would have to do a tiny bit more work when iterating because it 
would have to combine multibyte characters, but in exchange for that you get to 
skip a whole ton of copying when encoding and decoding.  How is this expensive 
to create and use?  For every application I have ever designed, encountered, or 
can even conjecture about, this would be cheaper.  (Assuming not just a UTF-8 
string type, but one for UTF-16 as well, where native data is in that format 
already.)

For what it's worth, not wanting to use abstract types in Emacs makes sense to 
me: I've written my share of elisp code, and it is hard to create reasonable 
abstractions in Emacs, because the facilities for defining types and creating 
polymorphic logic are so crude.  It's a lot easier to just assume your 
underlying storage is an array, because at the end of the day you're going to 
need to call some functions on it which care whether it's an array or an alist 
or a list or a vector anyway, so you might as well just say so up front.  But 
in Python we could just call 'mystring.by_character()' or 
'mystring.by_codepoint()' and get an iterator object back and forget about all 
that junk.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote:

> Greg Ewing writes:
>> On 24/11/10 22:03, Stephen J. Turnbull wrote:
>>> But
>>> if you actually need to remember positions, or regions, to jump to
>>> later or to communicate to other code that manipulates them, doing
>>> this stuff the straightforward way (just copying the whole iterator
>>> object to hang on to its state) becomes expensive.
>> 
>> If the internal representation of a text pointer (I won't call it
>> an iterator because that means something else in Python) is a byte
>> offset or something similar, it shouldn't take up any more space
>> than a Python int, which is what you'd be using anyway if you
>> represented text positions by grapheme indexes or whatever.
> 
> That's not necessarily true.  Eg, in Emacs ("there you go again"),
> Lisp integers are not only immediate (saving one pointer), but the
> type is encoded in the lower bits, so that there is no need for a type
> pointer -- the representation is smaller than the opaque marker type.
> Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
> 24 bytes on a 64-bit platform.

Yes, yes, lisp is very clever.  Maybe some other runtime, like PyPy, could make 
this optimization.  But I don't think that anyone is filling up main memory 
with gigantic piles of character indexes and need to squeeze out that extra 
couple of bytes of memory on such a tiny object.  Plus, this would allow such a 
user to stop copying the character data itself just to decode it, and on 
mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the 
bat.

> In Python it's true that markers can use the same data structure as
> integers and simply provide different methods, and it's arguable that
> Python's design is better.  But if you use bytes internally, then you
> have problems.

No, you just have design questions.

> Do you expose that byte value to the user?

Yes, but only if they ask for it.  It's useful for computing things like quota 
and the like.

> Can users (programmers using the language and end users) specify positions in 
> terms of byte values?

Sure, why not?

> If so, what do you do if the user specifies a byte value that points into a 
> multibyte character?

Go to the beginning of the multibyte character.  Report that position; if the 
user then asks the requested marker object for its position, it will report 
that byte offset, not the originally-requested one.  (Obviously, do the same 
thing for surrogate pair code points.)

> What if the user wants to specify position by number of characters?

Part of the point that we are trying to make here is that nobody really cares 
about that use-case.  In order to know anything useful about a position in a 
text, you have to have traversed to that location in the text. You can remember 
interesting things like the offsets of starts of lines, or the x/y positions of 
characters.

> Can you translate efficiently?

No, because there's no point :).  But you _could_ implement an overlay that 
cached things like the beginning of lines, or the x/y positions of interesting 
characters.

> As I say elsewhere, it's possible that there really never is a need to 
> efficiently specify an absolute position in a large text as a character 
> (grapheme, whatever) count.

> But I think it would be hard to implement an efficient text-processing 
> *language*, eg, a Python module
> for *full conformance* in handling Unicode, on top of UTF-8.

Still: why?  I guess if I have some free time I'll try my hand at it, and maybe 
I'll run into a wall and realize you're right :).

> Any time you have an algorithm that requires efficient access to arbitrary 
> text positions, you'll spend all your skull sweat fighting the 
> representation.  At least, that's been my experience with Emacsen.

What sort of algorithm would that be, though?  The main thing that I could 
think of is a text editor trying to efficiently allow the user to scroll to the 
middle of a large file without reading the whole thing into memory.  But, in 
that case, you could use byte-positions to estimate, and display an heuristic 
number while calculating the real line numbers.  (This is what 'less' does, and 
it seems to work well.)

>> So I don't really see what you're arguing for here. How do
>> *you* think positions in unicode strings should be represented?
> 
> I think what users should see is character positions, and they should
> be able to specify them numerically as well as via an opaque marker
> object.  I don't care whether that position is represented as bytes or
> characters internally, except that the experience of Emacsen is that
> representation as byte positions is both inefficient and fragile.  The
> representation as character positions is more robust but slightly more
> inefficient.

Is it really the representation as byte positions which is fragile (i.e. the 
internal implementation detail), or the exposure of that position to calling 
code, and the idio

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-03 Thread Glyph Lefkowitz

On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:

> On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor  wrote:
>> No, it's singularly impossible to prove that any global load will be any 
>> given
>> value at compile time.  Any optimization based on this premise is wrong.
> 
> True.
> 
> My proposed way out of this conundrum has been to change the language
> semantics slightly so that global names which (a) coincide with a
> builtin, and (b) have no explicit assignment to them in the current
> module, would be fair game for such optimizations, with the
> understanding that the presence of e.g. "len = len" anywhere in the
> module (even in dead code!) would be sufficient to disable the
> optimization.
> 
> But barring someone interested in implementing something based on this
> rule, the proposal has languished for many years.

Wouldn't this optimization break things like mocking out 'open' for testing via 
'module.open = fakeopen'?  I confess I haven't ever wanted to change 'len' but 
that one seems pretty useful.

If CPython wants such optimizations, it should do what PyPy and its ilk do, 
which is to notice the assignment, but recompile code in that module to disable 
the fast path at runtime, preserving the existing semantics.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Checking input range in time.asctime and time.ctime

2011-01-05 Thread Glyph Lefkowitz

On Jan 5, 2011, at 4:33 PM, Guido van Rossum wrote:

> Shouldn't the logic be to take the current year into account? By the
> time 2070 comes around, I'd expect "70" to refer to 2070, not to 1970.
> In fact, I'd expect it to refer to 2070 long before 2070 comes around.
> 
> All of which makes me think that this is better left to the app, which
> can decide for itself whether it is more important to represent dates
> in the future or dates in the past.

The point of this somewhat silly flag (as I understood its description earlier 
in the thread) is to provide compatibility with POSIX 2-year dates.  As per 
http://pubs.opengroup.org/onlinepubs/007908799/xsh/strptime.html - 

%y
is the year within century. When a century is not otherwise specified, values 
in the range 69-99 refer to years in the twentieth century (1969 to 1999 
inclusive); values in the range 00-68 refer to years in the twenty-first 
century (2000 to 2068 inclusive). Leading zeros are permitted but not required.

So, "70" means "1970", forever, in programs that care about this nonsense.

Personally, by the time 2070 comes around, I hope that "70" will just refer to 
70 A.D., and get you odd looks if you use it in a written date - you might as 
well just write '0' :).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >