[Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Justin Tulloss
Hi,

I had a whole long email about exactly what I was doing, but I think I'll
get to the point instead. I'm trying to implement a python concurrency API
and would like to use cpython to do it. To do that, I would like to remove
the GIL.

So, since I'm new to interpreter hacking, some help would be appreciated.
I've listed what I think the GIL does; if you guys could add to this list or
refine it, I would very much appreciate it.

Roles of the GIL:
1. Some global interpreter state/modules are protected (where are these
globals at?)
2. When writing C extensions I can change the state of my python object
without worrying about synchronization
3. When writing C extensions I can change my own internal C state without
worrying about synchronization (unless I have other, non-python threads
running)

Does anyone know of a place where the GIL is required when not operating on
a python object? It seems like this would never happen, and would make
replacing the GIL somewhat easier.

I've only started looking at the code recently, so please forgive my
naivety. I'm still learning how the interpreter works on a high level, let
alone all the nitty gritty details!

Thanks,
Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Justin Tulloss
On 9/11/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>
> > 1. Some global interpreter state/modules are protected (where are these
> > globals at?)
>
> It's the interpreter and thread state itself (pystate.h), for the thread
> state, also _PyThreadState_Current. Then there is the GC state, in
> particular "generations". There are various caches and counters also.


Caches seem like they definitely might be a problem. Would you mind
expanding on this a little? What gets cached and why?

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
> What do you think?
>

I'm going to have to agree with Martin here, although I'm not sure I
understand what you're saying entirely. Perhaps if you explained where the
benefits of this approach come from, it would clear up what you're thinking.


After a few days of thought, I'm starting to realize the difficulty of
maintaining compatibility with existing C extensions after removing the GIL.
The possible C-level side effects are very difficult to work around without
kernel or hardware level transaction support. I see a couple of approaches
that might work (though I probably haven't thought of everything).

1. Use message passing and transactions.
Put every module into its own tasklet that ends up getting owned by one
thread or another. Every call to an object that is owned by that module is
put into a module wide message queue and delivered sequentially to its
objects. All this does is serialize requests to objects implemented in C to
slightly mitigate the need to lock. Then use transactions to protect any
python object. You still have the problem of C side effects going unnoticed
(IE Thread A executes function, object sets c-state in a certain way, Thread
B calls the same function, changes all the C-state, A reacts to return value
that no longer reflects on the actual state). So, this doesn't actually
work, but its close since python objects will remain consistent
w/transactions and conflicting C-code won't execute simultaneously.

2. Do it perl style.
Perl just spawns off multiple interpreters and doesn't share state between
them. This would require cleaning up what state belongs where, and probably
providing some global state lock free. For instance, all the numbers,
letters, and None are read only, so we could probably work out a way to
share them between threads. In fact, any python global could be read only
until it is written to. Then it belongs to the thread that wrote to it and
is updated in the other threads via some sort of cache-coherency protocol. I
haven't really wrapped my head around how C extensions would play with this
yet, but essentially code operating in different threads would be operating
on different copies of the modules. That seems fair to me.

3. Come up with an elegant way of handling multiple python processes. Of
course, this has some downsides. I don't really want to pickle python
objects around if I decide they need to be in another address space, which I
would probably occasionally need to do if I abstracted away the fact that a
bunch of interpreters had been spawned off.

4. Remove the GIL, use transactions for python objects, and adapt all
C-extensions to be thread safe. Woo.

I'll keep kicking around ideas for a while; hopefully they'll become more
refined as I explore the code more.

Justin

PS. A good paper on how hardware transactional memory could help us out:
http://www-faculty.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf
A few of you have probably read this already. Martin is even acknowledged,
but it was news to me!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Jason Orendorff <[EMAIL PROTECTED]> wrote:
>
> On 9/13/07, Justin Tulloss <[EMAIL PROTECTED]> wrote:
> > 1. Use message passing and transactions.  [...]
> > 2. Do it perl style. [...]
> > 3. Come up with an elegant way of handling multiple python processes.
> [...]
> > 4. Remove the GIL, use transactions for python objects, [...]
>
> The SpiderMonkey JavaScript engine takes a very different approach,
> described here:
> http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety


This is basically the same as what perl does, as far as I understand it.
There are differences, but they're not that substantial. It's basically the
idea of keeping all state separate and treating global access as a special
case. I think this is a pretty solid approach, since globals shouldn't be
accessed that often. What we would want to do differently is make sure that
read-only globals can be cheaply accessed from any thread. Otherwise we lose
the performance benefit of having them in the first place.

Refcounting is another major issue.  SpiderMonkey uses GC instead.
> CPython would need to do atomic increfs/decrefs.  (Deferred
> refcounting could mitigate the cost.)


This is definitely something to think about. I don't really have an answer
straight off, but there are several things we could try.

The main drawback (aside from the amount of work) is the patent.
> SpiderMonkey's license grants a worldwide, royalty-free license, but
> not under the Python license.  I think this could be wrangled, if the
> technical approach looks worthwhile.


I'm not sure this is an issue. It's not like we would be using the code,
just the patented algorithm. Any code we wrote to implement the algorithm
would of course be covered under the python license. I'm not a legal guy
though.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
>
>
> Basically though, atomic incref/decref won't work.  Once you've got
> two threads modifying the same location the costs skyrocket.  Even
> without being properly atomic you'll get the same slowdown on x86
> (who's cache coherency is fairly strict.)




I'm a bit skeptical of the actual costs of atomic incref. For there to be
contention, you would need to have to be modifying the same memory location
at the exact same time. That seems unlikely to ever happen. We can't bank on
it never happening, but an occasionally expensive operation is ok. After
all, it's occasional.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> Jason Orendorff wrote:
> > The clever bit is that SpiderMonkey's per-object
> > locking does *not* require a context switch or even an atomic
> > instruction, in the usual case where an object is *not* shared among
> > threads.
>
> How does it tell whether an object is shared between
> threads? That sounds like the really clever bit to me.


If you look at the article, they have a code sample.

Basically a global is "owned" by the first thread that touches it. That
thread can do whatever it wants with that global. If another thread wants to
touch the global, it locks everything to do so.

This is a pretty good idea except that in Python there are so many globals
that all threads benefit from having access to. Luckily, except for their
reference counts, they're mostly read-only. Therefore, if we can work out
this reference count, we can probably use a similar concept.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/14/07, Adam Olsen <[EMAIL PROTECTED]> wrote:

> > Could be worth a try. A first step might be to just implement
> > the atomic refcounting, and run that single-threaded to see
> > if it has terribly bad effects on performance.
>
> I've done this experiment.  It was about 12% on my box.  Later, once I
> had everything else setup so I could run two threads simultaneously, I
> found much worse costs.  All those literals become shared objects that
> create contention.


It's hard to argue with cold hard facts when all we have is raw speculation.
What do you think of a model where there is a global "thread count" that
keeps track of how many threads reference an object? Then there are
thread-specific reference counters for each object. When a thread's refcount
goes to 0, it decrefs the object's thread count. If you did this right,
hopefully there would only be cache updates when you update the thread
count, which will only be when a thread first references an object and when
it last references an object.

I mentioned this idea earlier and it's growing on me. Since you've actually
messed around with the code, do you think this would alleviate some of the
contention issues?

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Justin Tulloss
Your idea can be combined with the maxint/2 initial refcount for
> non-disposable objects, which should about eliminate thread-count updates
> for them.
> --
>

 I don't really like the maxint/2 idea because it requires us to
differentiate between globals and everything else. Plus, it's a hack. I'd
like a more elegant solution if possible.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-15 Thread Justin Tulloss
I'm not sure I understand entirely what you're saying, but it sounds like
you want multiple reference counts. A reference count per thread might not
be a bad idea, but I can't think of how it would work without locks. If
every object has an array of reference counts, then the GC would need to
lock that array to check to see if they're all 0. That means the
incref/decref operations would need to acquire this lock or risk messing up
the GC.

Perhaps you could have something where you have a reference count per thread
and then a thread count per object. Then you would only need to lock the
thread count for the first and last reference a thread makes to an object.
Once there are no threads referencing and object, its obviously safe for
cleanup. Of course, I'm not convinced atomic ops are really so expensive you
can't have every thread doing it at once, but Adam says that the caches will
be thrashed if we have a bunch of threads continuously updating the same
memory address. I can see the possibility. Perhaps once we have a version
that actually demonstrates this thrashing, we can alleviate it with some
sort of multiple reference count scheme.

Justin

On 9/13/07, Tennessee Leeuwenburg <[EMAIL PROTECTED]> wrote:
>
> Pardon me for talking with no experience in such matters, but...
>
> Okay, incrementing a reference counter is atomic, therefore the cheapest
> possible operation. Is it possible to keep reference counting atomic in a
> multi-thread model?
>
> Could you do the following... let's consider two threads, "A" and "B".
> Each time an object is created, a reference count is created in both "A" and
> "B". Let's suppose "A" has a real reference and "B" has no reference really.
> Couldn't the GC check two reference registers for a reference count? The
> object would then be cleaned up only if both registers were 0.
>
> To exploit multiple CPUs, you could have two persistent Python processes
> on each CPU with its own mini-GIL. Object creation would then involve a call
> to each process to create the reference and GC would involve checking each
> process to see what their count is. However, it would mean that within each
> process, threads could create additional references or remove references in
> an atomic way.
>
> In a single-CPU system, this would be the same cost as currently, since I
> think that situation would devolve to having just one place to check for
> references. This seems to mean that it is the case that it would be no more
> expensive for a single-CPU system.
>
> In a two-CPU system, I'm no expertise on the actual call overheads of
> object creation and garbage collection, but logically it would double the
> effort of object creation and destruction (all such operations now need to
> occur on both processes) but would keep reference increments and decrements
> atomic.
>
> Once again, I'm really sorry if I'm completely off-base since I have never
> done any actual coding in this area, but I thought I'd make the suggestion
> just in case it happened to have relevance.
>
> Thanks,
> -Tennessee
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tulloss2%40uiuc.edu
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures

2007-09-18 Thread Justin Tulloss
On 9/18/07, Krishna Sankar <[EMAIL PROTECTED]> wrote:
>
> Folks,
>As a follow-up to the py3k discussions started by Bruce and Guido, I
> pinged Brett and he suggested I submit an exploratory proposal. Would
> appreciate insights, wisdom, the good, the bad and the ugly.


I am currently working on parallelizing python as an undergraduate
independent study. I plan on first removing the GIL with as little overall
effect as possible and then implementing a task-oriented threading API on
top, probably based on Stackless (since they already do a great job with
concurrency in a single thread).

If you're interested in all the details, I'd be happy to share. I haven't
gotten far yet (the semester just started!), but I feel that actually
implementing these things would be the best way to get a PEP through.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] GC Changes

2007-09-30 Thread Justin Tulloss
Hello,

I've been doing some tests on removing the GIL, and it's becoming clear that
some basic changes to the garbage collector may be needed in order for this
to happen efficiently. Reference counting as it stands today is not very
scalable.

I've been looking into a few options, and I'm leaning towards the
implementing IBMs recycler GC
(http://www.research.ibm.com/people/d/dfb/recycler-publications.html
) since it is very similar to what is in place now from the users'
perspective. However, I haven't been around the list long enough to really
understand the feeling in the community on GC in the future of the
interpreter. It seems that a full GC might have a lot of benefits in terms
of performance and scalability, and I think that the current gc module is of
the mark-and-sweep variety. Is the trend going to be to move away from
reference counting and towards the mark-and-sweep implementation that
currently exists, or is reference counting a firmly ingrained tradition?

On a more immediately relevant note, I'm not certain I understand the full
extent of the gc module. From what I've read, it sounds like it's fairly
close to a fully functional GC, yet it seems to exist only as a
cycle-detecting backup to the reference counting mechanism. Would somebody
care to give me a brief overview on how the current gc module interacts with
the interpreter, or point me to a place where that is done? Why isn't the
mark-and-sweep mechanism used for all memory management?

Thanks a lot!

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] GC Changes

2007-10-01 Thread Justin Tulloss
> The cyclic GC kicks in when memory is running low.


When what memory is running low? Its default pool? System memory?

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com