Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Brendan Cully
On Friday, 10 April 2009 at 15:05, P.J. Eby wrote: > At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: >> This problem (slow application startup times due to too many imports at >> startup, which can in turn can be due to top level imports for library >> or framework functionality that a given appli

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread P.J. Eby
At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I s

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Toshio Kuratomi
Robert Collins wrote: > Certainly, import time is part of it: > robe...@lifeless-64:~$ python -m timeit -s 'import sys; import > bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors" > 10 loops, best of 3: 18.7 msec per loop > > (errors.py is 3027 lines long with 347 exception

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Peter Otten
John Arbash Meinel wrote: > Not as big of a difference as I thought it would be... But I bet if > there was a way to put the random shuffle in the inner loop, so you > weren't accessing the same identical 25k keys internally, you might get > more interesting results. You can prepare a few random

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote: > Robert Collins canonical.com> writes: > > > > (errors.py is 3027 lines long with 347 exception classes). > > 347 exception classes? Perhaps your framework is over-engineered. > > Similarly, when using a heavy Web framework, reloading a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Antoine Pitrou
Robert Collins canonical.com> writes: > > (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame Python for that.

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a la

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Nick Coghlan
Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? IIRC every time I was asked > to reduce the start-up cost of some Python app, the cause was too many > imports, and the solution was either to speed up import itself (.pyc

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... >> Somewhat true, though I know it happens 25k times during startup of >> bzr... And I would be a *lot* happier if startup time was 100ms instead >> of 400ms. > > I don't want to quash your idealism too severely, but it is extremely > unlikely that you are going to get anywhere near that kind

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Guido van Rossum
On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter wrote: > On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel > wrote: > >And I would be a *lot* happier if startup time was 100ms instead > > of 400ms. > > Quite so. We have a number of internal tools, and they find that > frequently just starting up

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel wrote: > Greg Ewing wrote: >> John Arbash Meinel wrote: >>> And the way intern is currently >>> written, there is a third cost when the item doesn't exist yet, which is >>> another lookup to insert the object. >> >> That's even rarer still, since

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jeffrey Yasskin
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel wrote: > Greg Ewing wrote: >> John Arbash Meinel wrote: >>> And the way intern is currently >>> written, there is a third cost when the item doesn't exist yet, which is >>> another lookup to insert the object. >> >> That's even rarer still, since

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Mike Klaas
On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Greg Ewing wrote: > John Arbash Meinel wrote: >> And the way intern is currently >> written, there is a third cost when the item doesn't exist yet, which is >> another lookup to insert the object. > > That's even rarer still, since it only happens the first > time you load a piece of code that use

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Greg Ewing
John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Benjamin Peterson
2009/4/9 Greg Ewing : > John Arbash Meinel wrote: >> >> And when you look at the intern function, it doesn't use >> setdefault logic, it actually does a get() followed by a set(), which >> means the cost of interning is 1-2 lookups depending on likelyhood, etc. > > Keep in mind that intern() is cal

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Greg Ewing
John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at mo

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jake McGuire
On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote: Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536ki

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
> Also, consider that resizing has to evaluate every object, thus paging > in all X bytes, and assigning to another 2X bytes. Cutting X by > (potentially 3), would probably have a small but measurable effect. I'm *very* skeptical about claims on performance in the absence of actual measurements. T

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Martin v. Löwis wrote: >> I don't have numbers on how much that would improve CPU times, I would >> imagine improving 'intern()' would impact import times more than run >> times, simply because import time is interning a *lot* of strings. >> >> Though honestly, Bazaar would really like this, becaus

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
> I don't have numbers on how much that would improve CPU times, I would > imagine improving 'intern()' would impact import times more than run > times, simply because import time is interning a *lot* of strings. > > Though honestly, Bazaar would really like this, because startup overhead > for us

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... > I like your rationale (save memory) much more, and was asking in the > tracker for specific numbers, which weren't forthcoming. > ... > Now that you brought up a specific numbers, I tried to verify them, > and found them correct (although a bit unfortunate), please see my > test script b

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
> So I guess some of it comes down to whether "loweis" would also reject > this change on the basis that mathematically a "set is not a dict". I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be acce

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Alexander Belopolsky wrote: > On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel > wrote: > ... >> a) Don't keep a double reference to both key and value to the same >> object (1 pointer per entry), this could be as simple as using a >> Set() instead of a dict() >> > > There is a reject

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Christian Heimes wrote: > John Arbash Meinel wrote: >> When I looked at the actual references from interned, I saw mostly >> variable names. Considering that every variable goes through the python >> intern dict. And when you look at the intern function, it doesn't use >> setdefault logic, it actua

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Christian Heimes
John Arbash Meinel wrote: > When I looked at the actual references from interned, I saw mostly > variable names. Considering that every variable goes through the python > intern dict. And when you look at the intern function, it doesn't use > setdefault logic, it actually does a get() followed by a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel wrote: > ... > >>> Anyway, I the internals of intern() could be done a bit better. Here are >>> some concrete things: >>> >> >> [snip] >> >> Memory usage is definitely something we're interested in improving. >> Since you've already looked at this

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... >> Anyway, I the internals of intern() could be done a bit better. Here are >> some concrete things: >> > > [snip] > > Memory usage is definitely something we're interested in improving. > Since you've already looked at this in some detail, could you try > implementing one or two of your

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
Hi John, On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > I've been doing some memory profiling of my application, and I've found > some interesting results with how intern() works. I was pretty surprised > to see that the "interned"

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Dirkjan Ochtman
On Thu, Apr 9, 2009 at 17:31, Aahz wrote: > Please do subscribe to python-dev ASAP; I also suggest that you subscribe > to python-ideas, because I suspect that this is sufficiently blue-sky to > start there. It might also be interesting to the unladen-swallow guys. Cheers, Dirkjan _

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Aahz
On Thu, Apr 09, 2009, John Arbash Meinel wrote: > > PS> I'm not yet subscribed to python-dev, so if you could make sure to > CC me in replies, I would appreciate it. Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficient

[Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been doing some memory profiling of my application, and I've found some interesting results with how intern() works. I was pretty surprised to see that the "interned" dict was actually consuming a significant amount of total memory. To give the sp