from:"Phillip Sitbon"

[Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon

Hi everyone,

I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)

Anyway, I'd like to ask something that may have been asked already, so
I apologize if this has been covered.

Instead of removing the GIL, has anyone thought of making it more
lightweight? The current situation for Windows is that the
single-thread case is decently fast (via interlocked operations), but
it drops to using an event object in the case of contention. (see
thread_nt.h)

Now, I don't have any specific evidence aside from my experience in
Windows multithreaded programming, but event objects are often
considered the slowest synchronization mechanism available. So, what
are the alternatives? Mutexes or critical sections. Semaphores too, if
you want to get fancy, but I digress.

Because mutexes have the capability of inter-process locking, which we
don't need, critical sections fit the bill as a lightweight locking
mechanism. They work in a way similar to how the Python GIL is
handled: first, attempt an interlocked operation, and if another
thread owns the lock, wait on a kernel object. They are known to be
extremely fast.

There are some catches with using a critical section instead of the
current method:

1. It is recursive, while the current GIL setup is not. Would it break
Python to support (or deal with) recursive behavior at the GIL level?
Note that we can still disallow recursion and fail because we know if
the current thread is the lock owner, but the return from the lock
function is usually only checked when the wait parameter is zero
(meaning "don't block, just try to acquire"). The biggest problem I
see here is how mixing the PyGILState_* API with multiple interpreters
will behave: when PyGILState_Ensure() is called while the GIL is held
for a thread state under an interpreter other than the main
interpreter, it tries to re-lock the GIL. This would normally cause a
deadlock, but the best we could do with a critical section is have the
call fail and/or increase a recursion counter. If maintaining behavior
is absolutely necessary, I guess it would be pretty easy to force a
deadlock. Personally, I would prefer a Py_FatalError or something like
it.

2. Backwards incompatibility: TryEnterCriticalSection isn't available
pre-NT4, so Windows 95 support is broken. Microsoft doesn't support or
even mention it in the list of supporting OSes for their API functions
anymore, so... non-issue? Some of the data structure is available to
us, so I bet it would be easy to implement the function manually.

3. ?? - I'm sure there are other issues that deserve a look.

I've given this a shot already while doing some concurrency testing
with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
yet. I'm using my modified python26.dll to run all of my Python code
and trying to find anywhere it could possibly break. For multiple
concurrent requests against a single multithreaded ISAPI handler
process, I see a statistically significant speed increase depending on
how much Python code is executed. With more Python code executed (e.g.
a Django page), the speedup was about 2x. I haven't tested with varied
values for _Py_CheckInterval aside from finding a sweet spot for my
specific purposes, but using 100 (the default) would likely make the
performance difference more noticeable. A spin mutex also does well,
but the results vary a lot more.

Just as a disclaimer, my tests were nowhere near scientific, but if
anyone needs convincing I can come up with some actual measurements. I
think at this point most of you are wondering more about what it would
break.

Hopefully I haven't wasted anyone's time - I just wanted to share what
I see as a possibly substantial improvement to Python's core. let me
know if you're interested in a patch to use for your own testing.

Cheers,

Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon

> You should definitely open a bug entry in http://bugs.python.org. There, post
> your patch, some explanations and preferably a quick way (e.g. a simple 
> script)
>  of reproducing the speedups (without having to install a third-party library 
> or
> extension, that is).

I'll get started on that. I'm assuming I should generate a patch from
the trunk (2.7)? The file doesn't look different, but I want to make
sure I get it from the right place.

> I wonder if the patch could be structured as a conditional compilation?  You
> know how many different spots are touched, and how many lines per spot.
>
> If it could be, then theoretically it could be released and people could  do
> lots of comparative stress testing with different workloads.

That would be easy to do, because I am just replacing the
*NonRecursiveMutex functions.

> What about fairness? I don't know off-hand whether the GIL is
> fair, or whether critical sections are fair, but it needs to be
> considered.

If you define fairness in this context as not starving other threads
while consuming resources, that is built into the interpreter via
sys.setcheckinterval() and also anywhere the GIL is released for I/O.
What might be interesting is to see if releasing a critical section
and immediately re-acquiring it every _Py_CheckInterval bytecode
operations behaves in a similar manner (see ceval.c, line 869). My
best guess right now is that it will behave as expected when not using
the spin-based critical section. AFAIK, the kernel processes waiters
in a FIFO manner without regard to priority. Because a guarantee of
mutual exclusion is absolutely necessary, it's up to applications to
provide fairness. Python does a decent job of this.

- Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon

> FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't
> guarantee a defined order of wakeup among threads of equal priority.

Indeed, I should have quoted the MSDN docs:

"The threads of a single process can use a critical section object for
mutual-exclusion synchronization. There is no guarantee about the
order in which threads will obtain ownership of the critical section,
however, the system will be fair to all threads."

http://msdn.microsoft.com/en-us/library/ms683472(VS.85).aspx

I read somewhere else that the FIFO order is present, but obviously we
shouldn't to expect that if it's not documented as such.

> According to a past discussion on this list, the current implementation isn't:
> http://mail.python.org/pipermail/python-dev/2008-March/077814.html
> (at least on the poster's system)
>

I believe he's only talking about Linux. Apples & oranges when it
comes to stuff like this, although it still justifies looking into
what happens every _Py_CheckInterval on Windows.

- Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon

> No: fairness in mutex synchronization means that every waiter for the
> mutex will eventually acquire it; it won't happen that one thread
> starves waiting for the mutex. This is something that the mutex needs to
> provide, not the application.

Right, I guess I was thinking of it in terms of needing to release the
mutex at some point in order for it to be later acquired.

> Please trust Antoine that it's relevant: if the current implementation
> isn't fair on Linux, there is no need for the new implementation to be
> fair on Windows.

Fair enough.

--

While setting up my patch, I'm noticing something that could be
potentially bad for this idea that I overlooked until just now. I'm
going to hold off on submitting a ticket unless others suggest it's a
better idea to keep this discussion going there.

The thread module's lock object uses the same code used to lock and
unlock the GIL. By replacing the current locking mechanism with a
critical section, it'd be breaking the expected functionality of the
lock object, specifically two cases:

1. Blocking recursion: Critical sections don't block on recursion, no
way to enforce that
2. Releasing: Currently any thread can release a lock, but only the
owner release a critical section

Of course blocking recursion is only meaningful with the current
behavior of #2, otherwise it's an unrecoverable deadlock.

There are a few solutions to this. The first would be to implement
only the GIL as a critical section. The problem then is the need to
change all of the core code that does not use
PyEval_Acquire/ReleaseLock (there is some, right?), which is the best
place to use something other than the thread module's locking
mechanism on the GIL. This is doable with some effort, but clearly not
an option if there is any possibility that extensions are using
something other than the PyThreadState_*, PyGILState_* and PyEval_*
APIs to manipulate the GIL (are there others?). After any of this, of
course, I wonder what kind of crazy things might be expected of the
GIL externally that requires its behavior to remain as it is.

The second solution would be to use semaphores. I can't say yet if it
would be worth it performance-wise so I will refrain from conjecture
for the moment.

I like the first solution above... I don't know why non-recursion
would be necessary for the GIL; clearly it would be a little more
involved, but if I can demonstrate the performance gain maybe it's
worth my time.

- Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-27 Thread Phillip Sitbon

Heads up to those who were following, I did my best to clearly outline
the situation and direction in the tracker.
http://bugs.python.org/issue6132

It includes a patch that will break the expected behavior of the
thread lock object but make it possible to test GIL performance.

> Note that for the GIL, if you use a CriticalSection object, you should 
> initialize its "spincount" to zero, because the GIL is almost always in 
> contention.  That is, if you don't get the GIL right away, you won't for a 
> while.

If I'm not mistaken, calling InitializeCriticalSection rather than
InitializeCriticalSectionAndSpinCount (gotta love those long function
names) sets the spin count to zero. I could tell when the spin count
wasn't zero as far as performance is concerned - spinning is too much
of a gamble in most contention situations.

> I don't know what kernel primitive the Critical Section  uses, but if it uses 
> an Event object or something similar, we are in the same soup, so to say, 
> because the CriticalSection's spinlocking feature buys us nothing.

Judging from the increase in speed and CPU utilization I've seen, I
don't believe this is the case. My guess is that it's something
similar to a futex.

- Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-28 Thread Phillip Sitbon

The testing patch I submitted to the tracker includes a semaphore as
well, and I did take some time to try it out. It seems that it's no
better than the event object, either for a single thread or scaled to
many threads... so this does appear to indicate that the WaitForXX
functions are costly (which is expected) and scale terribly (which is
unfortunate). I had always believed event objects to be "slower" but
I'm not seeing a difference here compared to semaphores. My guess is
that these results could be very different if I were to test on, say,
Windows 2000 instead of Vista.

- Phillip

2009/5/28 Kristján Valur Jónsson :
> You are right, a small experiment confirmed that it is set to 0 (see 
> SetCriticalSectionSpinCount())
> I had assumed that a small non-zero value might be chosen on multiprocessor 
> machines.
>
> Do you think that the problem lies with the use of the "event" object as 
> such?  Have you tried using a "semaphore" or "mutex" instead?  Or do you 
> think that all of the synchronizations primitives that rely on the 
> WaitForMultipleObjects() api are subject to the same issue?
>
> Cheers,
>
> Kristján
>
> -Original Message-
> From: python-dev-bounces+kristjan=ccpgames@python.org 
> [mailto:python-dev-bounces+kristjan=ccpgames@python.org] On Behalf Of 
> Phillip Sitbon
> Sent: 27. maí 2009 22:23
> To: python-dev
> Subject: Re: [Python-Dev] Making the GIL faster & lighter on Windows
>
>
> If I'm not mistaken, calling InitializeCriticalSection rather than
> InitializeCriticalSectionAndSpinCount (gotta love those long function
> names) sets the spin count to zero. I could tell when the spin count
> wasn't zero as far as performance is concerned - spinning is too much
> of a gamble in most contention situations.
>
>> I don't know what kernel primitive the Critical Section  uses, but if it 
>> uses an Event object or something similar, we are in the same soup, so to 
>> say, because the CriticalSection's spinlocking feature buys us nothing.
>
> Judging from the increase in speed and CPU utilization I've seen, I
> don't believe this is the case. My guess is that it's something
> similar to a futex.
>
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] FINAL PROPULSION OPEN SOURCE ENGINE VARIANT FORTHE F-35 JOINT STRIKE FIGHTER

2009-06-12 Thread Phillip Sitbon

>> I question the whole notion of using open source in military weapons.
>> It seems like a rather basic violation of operational security.  Perhaps
>> your enemies will exploit your bugs instead of nicely reporting them
>> and submitting patches on SourceForge ;-)
>
> Eric Raymond would argue that it's probably the other way around --
> proprietary software doesn't have enough eyeballs to make it safe. :-)
>

I guess in some cases it wouldn't matter if it were open source:

http://online.wsj.com/article/SB124027491029837401.html

I'm not sure of the seriousness (or mental stability) of Mr. "OMEGA
RED", but I definitely got a chuckle from this :)

 - Phillip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pthread sem PyThread_acquire_lock

2009-06-29 Thread Phillip Sitbon

I'll do my best to try and explain/contribute, but please feel free to
correct anything I get wrong.

I believe the "swallowing" he's referring to is the ignoring of errno
EINTR. I don't think that's the correct place to handle signals to
begin with- why not just use the signal module to deal with such a
scenario?

http://docs.python.org/dev/library/signal.html#module-signal

AFAIK, ignoring EINTR doesn't preclude the calling of signal handlers.
There are many functions that don't return this value anymore, making
them reentrant. I remember a number of years ago when it wasn't part
of any standard to return EINTR or not, and so the only way to provide
consistent behavior was to ignore it and loop. I'm not sure if that is
still the case.

A great example is reading from a socket. Whether or not it can be
interrupted depends on the platform, so catching Ctrl+C often requires
a timeout loop.

Also, remember that signals are asynchronous in the sense that they
are handled outside the normal execution flow of a program. Checking
for EINTR probably isn't the best way to determine if a signal has
been sent to the program.

Cheers,

Phillip

On Sat, Jun 27, 2009 at 3:58 PM, Florian Mayer  wrote:
>
> Martin v. Löwis wrote:
>>
>> Can you please explain what you mean by "swallowing"?
>
> The signals don't get through when acquiring the lock.
>>
>> What is the sequence of actions triggering the case you are talking
>> about, what happens, and what do you expect to happen instead?
>
> Easiest way is to create a Queue.Queue and call .get() on it and try to 
> Ctrl-C it.
>>
>> Also, how would you fix this (in principle, not what the specific
>> patch would look like)?
>
> Remove the loop that explicitly does this in a new function and use that
> one in threadmodule.c for lock.acquire.
>>
>> Regards,
>> Martin
>
> Best regards,
> Florian
>
> PS: Excuse me, I messed up things once again.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/phillip.sitbon%2Bpython-dev%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-ideas] Remove GIL with CAS instructions?

2009-10-22 Thread Phillip Sitbon

I'd just like to point out some previous discussion about implementing
the GIL as a critical section or semaphore on Windows since it's come
up here (although not quite the center of the OP's proposal AFAICS):

http://bugs.python.org/issue6132
http://mail.python.org/pipermail/python-dev/2009-May/089746.html

Some of this is more low-level. I did see higher performance when
using non-Event objects, although I have not had time to follow up and
do a deeper analysis. The GIL flashing "problem" with critical
sections can very likely be rectified with a call to Sleep(0) or
YieldProcessor() for those who are worried about it. On the subject of
fairness, I tested various forms of the GIL on my multi-threaded ISAPI
extension, where every millisecond counts when under high concurrency,
and fairness wasn't an issue for single- or multi-core systems. It may
be anecdotal, but it also may be that the issue is somewhat
over-blown.

It seems like these discussions come up in one form or another a few
times a year and don't really get anywhere - probably because many
people find that it's easier to just run one instance of Python on
each core/processor. IPC is cheap (cPickle rocks!), and Python's
memory footprint is acceptable by today's standards. Still, it is an
interesting topic to many, myself included.

Also, many people keep talking about inefficiencies due to threads
waking up to a locked GIL. I'd like to see examples of this- most of
the time, the OS should know that the thread is contending on the lock
object and it is skipped over. Granted, a thread may wake up just to
release the GIL shortly thereafter, but that's why
sys.setcheckinterval() is there for us to tinker with.

Anyway, enough of my $0.02.

 - Phillip

2009/10/21 John Arbash Meinel :
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Kristján Valur Jónsson wrote:
> ...
>
>> This depends entirely on the platform and primitives used to implement the 
>> GIL.
>> I'm interested in windows.  There, I found this article:
>> http://fonp.blogspot.com/2007/10/fairness-in-win32-lock-objects.html
>> So, you may be on to something.  Perhaps a simple C test is in order then?
>>
>> I did that.  I found, on my dual-core vista machine, that running "release", 
>> that both Mutexes and CriticalSections behaved as you describe, with no 
>> "fairness".  Using a "semaphore" seems to retain fairness, however.
>> "fairness" was retained in debug builds too, strangely enough.
>>
>> Now, Python uses none of these.  On windows, it uses an "Event" object 
>> coupled with an atomically updated counter.  This also behaves fairly.
>>
>> The test application is attached.
>>
>>
>> I think that you ought to sustantiate your claims better, maybe with a 
>> specific platform and using some test like the above.
>>
>> On the other hand, it shows that we must be careful what we use.  There has 
>> been some talk of using CriticalSections for the GIL on windows.  This test 
>> ought to show the danger of that.  The GIL is different than a regular lock. 
>>  It is a reverse-lock, really, and therefore may need to be implemented in 
>> its own special way, if we want very fast mutexes for the rest of the system 
>> (cc to python-dev)
>>
>> Cheers,
>>
>> Kristján
>
> I can compile and run this, but I'm not sure I know how to interpret the
> results. If I understand it correctly, then everything but "Critical
> Sections" are fair on my Windows Vista machine.
>
> To run, I changed the line "#define EVENT" to EVENT, MUTEX, SEMAPHORE
> and CRIT. I then built and ran in "Release" environment (using VS 2008
> Express)
>
> For all but CRIT, I saw things like:
> thread       5532 reclaims GIL
> thread       5532 working 51234 units
> thread       5532 worked 51234 units: 1312435761
> thread       5532 flashing GIL
> thread       5876 reclaims GIL
> thread       5876 working 51234 units
> thread       5876 worked 51234 units: 1312435761
> thread       5876 flashing GIL
>
> Where there would be 4 lines for one thread, then 4 lines for the other
> thread.
>
> for CRIT, I saw something more like 50 lines for one thread, and then 50
> lines for the other thread.
>
> This is Vista Home Basic, and VS 2008 Express Edition, with a 2-core
> machine.
>
> John
> =:->
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkrfKFAACgkQJdeBCYSNAANbuQCgudU0IChylofTwvUk/JglChBd
> 9gsAoIJHj63/CagKpduUsd68HV8pP3QX
> =CuUj
> -END PGP SIGNATURE-
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/phillip.sitbon%2Bpython-dev%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%4

Re: [Python-Dev] GIL behaviour under Windows

2009-10-22 Thread Phillip Sitbon

I know I already posted some relevant threads to the other discussion,
but I wanted to point out a couple of specific comments on GIL
fairness from the discussion:

http://mail.python.org/pipermail/python-dev/2009-May/089752.html
http://mail.python.org/pipermail/python-dev/2009-May/089755.html

- Phillip

On Thu, Oct 22, 2009 at 10:16 AM, Antoine Pitrou  wrote:
> Tres Seaver  palladion.com> writes:
>>
>> I read Sturla as saying there were 99,939 switches out of a possible
>> 100,000, with sys.checkinterval set to 100.
>
> Oops, you're right.
> But depending on the elapsed time (again :-)), it may be too high, because
> too many switches per second will add a lot of overhead and decrease 
> performance.
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/phillip.sitbon%2Bpython-dev%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] Making the GIL faster & lighter on Windows

Re: [Python-Dev] FINAL PROPULSION OPEN SOURCE ENGINE VARIANT FORTHE F-35 JOINT STRIKE FIGHTER

Re: [Python-Dev] pthread sem PyThread_acquire_lock

Re: [Python-Dev] [Python-ideas] Remove GIL with CAS instructions?

Re: [Python-Dev] GIL behaviour under Windows

10 matches

Site Navigation

Mail list logo

Footer information