[Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%

2005-02-16 Thread Gfeller Martin
Dear all,

I'm running a large Zope application on a 1x1GHz CPU 1GB mem 
Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 
The application typically builds large lists by appending 
and extending them. 

We regularly observed that using a given functionality a 
second time using the same process was much slower (50%) 
than when it ran the first time after startup. 
This behavior greatly improved with Python 2.3 (thanks 
to the improved Python object allocator, I presume). 

Nevertheless, I tried to convert the heap used by Python 
to a Windows Low Fragmentation Heap (available on XP 
and 2003 Server). This improved the overall run time 
of a typical CPU-intensive report by about 15% 
(overall run time is in the 5 minutes range), with the
same memory consumption.

I consider 15% significant enough to let you know about it.

For information about the Low Fragmentation Heap, see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/low_fragmentation_heap.asp

Best regards,
Martin 

PS: Since I don't speak C, I used ctypes to convert all 
heaps in the process to LFH (I don't know how to determine
which one is the C heap).





COMIT AG
Risk Management Systems
Pflanzschulstrasse 7 
CH-8004 Zürich 

Telefon +41 (44) 1 298 92 84 

http://www.comit.ch 
http://www.quantax.com - Quantax Trading and Risk System

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%

2005-02-17 Thread Gfeller Martin
Hi,

what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, 
which (I believe) are heavily used by ZODB (which in turn is heavily used by 
the application). 

The lists also get fairly large, although not huge - up to typically 5 
(complex) objects in the tests I've measured. As I said, I don't speak C, so I 
can only speculate - do the lists at some point grow beyond the upper limit of 
obmalloc, but are handled by the LFH (which has a higher upper limit, if I 
understood Tim Peters correctly)?

Best regards,
Martin





-Original Message-
From: Evan Jones [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 17 Feb 2005 02:26
To: Python Dev
Cc: Gfeller Martin; Martin v. Löwis
Subject: Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%


On Feb 16, 2005, at 18:42, Martin v. Löwis wrote:
> I must admit that I'm surprised. I would have expected
> that most allocations in Python go through obmalloc, so
> the heap would only see "large" allocations.
>
> It would be interesting to find out, in your application,
> why it is still an improvement to use the low-fragmentation
> heaps.

Hmm... This is an excellent point. A grep through the Python source 
code shows that the following files call the native system malloc (I've 
excluded a few obviously platform specific files). A quick visual 
inspection shows that most of these are using it to allocate some sort 
of array or string, so it likely *should* go through the system malloc. 
Gfeller, any idea if you are using any of the modules on this list? If 
so, it would be pretty easy to try converting them to call the obmalloc 
functions instead, and see how that affects the performance.

Evan Jones


Demo/pysvr/pysvr.c
Modules/_bsddb.c
Modules/_curses_panel.c
Modules/_cursesmodule.c
Modules/_hotshot.c
Modules/_sre.c
Modules/audioop.c
Modules/bsddbmodule.c
Modules/cPickle.c
Modules/cStringIO.c
Modules/getaddrinfo.c
Modules/main.c
Modules/pyexpat.c
Modules/readline.c
Modules/regexpr.c
Modules/rgbimgmodule.c
Modules/svmodule.c
Modules/timemodule.c
Modules/zlibmodule.c
PC/getpathp.c
Python/strdup.c
Python/thread.c

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Windows Low Fragmentation Heap yields speedup of ~15%

2005-02-23 Thread Gfeller Martin
> A well-known trick is applicable in that case, if Martin thinks it's
> worth the bother:
> grow the list to its final size once, at the start (overestimating if
> you don't know for sure).  Then instead of appending, keep an index to
> the next free slot, same as you'd do in C.  Then the list guts never
> move, so if that doesn't yield the same kind of speedup without using
> LFH, list copying wasn't actually the culprit to begin with.

I actually did that in Py2.1, and removed it when we switched to Zope 2.7/Py 
2.3,
because it was no longer worth it, presumably due to obmalloc becoming
enabled. Unfortunately, I have lost the speedup gained by my Fast-Append-List
in Py 2.1, but I recall it saved about 5% in a similar test than the
one I have today. 

> Yes.  For example, a 300-character string could do it (that's not
> small to obmalloc, but is to LFH).  Strings produced by pickling are
> very often that large, and especially in Zope (which uses pickles
> extensively under the covers -- reading and writing persistent objects
> in Zope all involve pickle strings).

With my amateur (in C-stuff) knowledge, this sounds as a very likely point. 
>From Evan's 17 Feb mail, it sounds that cPickle would not use obmalloc - 
if this is the case, wouldn't that be an obvious (and relatively easy) speedup?

> If someone were motivated enough, it would probably be easiest to 
> run Martin's app on a Linux box, and use the free Linux tools to analyze it.

As it is often the case, real-life figures do not come from a benchmark,
but from an application test with real data, so putting the whole thing
up on Linux would need quite some time - which I don't have :-(

Best regards, Martin




-Original Message-
From: Tim Peters [mailto:[EMAIL PROTECTED] 
Sent: Friday, 18 Feb 2005 23:52
To: Evan Jones
Cc: Gfeller Martin; Martin v. Löwis; Python Dev
Subject: Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%


[Tim Peters]
...
>> Then you allocate a small object, marked 's':
>>
>> bbbsfff

[Evan Jones]
> Isn't the whole point of obmalloc

No, because it doesn't matter what follows that introduction: 
obmalloc has several points, including exploiting the GIL, heuristics
aiming at reusing memory while it's still high in the memory
heirarchy, almost never touching a piece of memory until it's actually
needed, and so on.

> is that we don't want to allocate "s" on the heap, since it is small?

That's one of obmalloc's goals, yes.  But "small" is a relative
adjective, not absolute.  Because we're primarily talking about LFH
here, the natural meaning for "small" in _this_ thread is < 16KB,
which is much larger than "small" means to obmalloc.  The memory-map
example applies just well to LFH as to obmalloc, by changing which
meaning for "small" you have in mind.

> I guess "s" could be an object that might potentially grow.

For example, list guts in Python are never handled by obmalloc,
although the small fixed-size list _header_ object is always handled
by obmalloc.

>> One thing to take from that is that LFH can't be helping list-growing
>> in a direct way either, if LFH (as seems likely) also needs to copy
>> objects that grow in order to keep its internal memory segregated by
>> size.  The indirect benefit is still available, though:  LFH may be
>> helping simply by keeping smaller objects out of the general heap's
>> hair.

> So then wouldn't this mean that there would have to be some sort of
> small object being allocated via the system malloc that is causing the
> poor behaviour?

Yes.  For example, a 300-character string could do it (that's not
small to obmalloc, but is to LFH).  Strings produced by pickling are
very often that large, and especially in Zope (which uses pickles
extensively under the covers -- reading and writing persistent objects
in Zope all involve pickle strings).

> As you mention, I wouldn't think it would be list objects, since resizing
> lists using LFH should be *worse*.

Until they get to LFH's boundary for "small", and we have only the
vaguest idea what Martin's app does here -- we know it grows lists
containing 50K elements in the end, and ... well, that's all I really
know about it .

A well-known trick is applicable in that case, if Martin thinks it's
worth the bother:
grow the list to its final size once, at the start (overestimating if
you don't know for sure).  Then instead of appending, keep an index to
the next free slot, same as you'd do in C.  Then the list guts never
move, so if that doesn't yield the same kind of speedup without using
LFH, list copying wasn't ac