[Python-Dev] Removing the GIL (Me, not you!)
Hi, I had a whole long email about exactly what I was doing, but I think I'll get to the point instead. I'm trying to implement a python concurrency API and would like to use cpython to do it. To do that, I would like to remove the GIL. So, since I'm new to interpreter hacking, some help would be appreciated. I've listed what I think the GIL does; if you guys could add to this list or refine it, I would very much appreciate it. Roles of the GIL: 1. Some global interpreter state/modules are protected (where are these globals at?) 2. When writing C extensions I can change the state of my python object without worrying about synchronization 3. When writing C extensions I can change my own internal C state without worrying about synchronization (unless I have other, non-python threads running) Does anyone know of a place where the GIL is required when not operating on a python object? It seems like this would never happen, and would make replacing the GIL somewhat easier. I've only started looking at the code recently, so please forgive my naivety. I'm still learning how the interpreter works on a high level, let alone all the nitty gritty details! Thanks, Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/11/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > > 1. Some global interpreter state/modules are protected (where are these > > globals at?) > > It's the interpreter and thread state itself (pystate.h), for the thread > state, also _PyThreadState_Current. Then there is the GC state, in > particular "generations". There are various caches and counters also. Caches seem like they definitely might be a problem. Would you mind expanding on this a little? What gets cached and why? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
> What do you think? > I'm going to have to agree with Martin here, although I'm not sure I understand what you're saying entirely. Perhaps if you explained where the benefits of this approach come from, it would clear up what you're thinking. After a few days of thought, I'm starting to realize the difficulty of maintaining compatibility with existing C extensions after removing the GIL. The possible C-level side effects are very difficult to work around without kernel or hardware level transaction support. I see a couple of approaches that might work (though I probably haven't thought of everything). 1. Use message passing and transactions. Put every module into its own tasklet that ends up getting owned by one thread or another. Every call to an object that is owned by that module is put into a module wide message queue and delivered sequentially to its objects. All this does is serialize requests to objects implemented in C to slightly mitigate the need to lock. Then use transactions to protect any python object. You still have the problem of C side effects going unnoticed (IE Thread A executes function, object sets c-state in a certain way, Thread B calls the same function, changes all the C-state, A reacts to return value that no longer reflects on the actual state). So, this doesn't actually work, but its close since python objects will remain consistent w/transactions and conflicting C-code won't execute simultaneously. 2. Do it perl style. Perl just spawns off multiple interpreters and doesn't share state between them. This would require cleaning up what state belongs where, and probably providing some global state lock free. For instance, all the numbers, letters, and None are read only, so we could probably work out a way to share them between threads. In fact, any python global could be read only until it is written to. Then it belongs to the thread that wrote to it and is updated in the other threads via some sort of cache-coherency protocol. I haven't really wrapped my head around how C extensions would play with this yet, but essentially code operating in different threads would be operating on different copies of the modules. That seems fair to me. 3. Come up with an elegant way of handling multiple python processes. Of course, this has some downsides. I don't really want to pickle python objects around if I decide they need to be in another address space, which I would probably occasionally need to do if I abstracted away the fact that a bunch of interpreters had been spawned off. 4. Remove the GIL, use transactions for python objects, and adapt all C-extensions to be thread safe. Woo. I'll keep kicking around ideas for a while; hopefully they'll become more refined as I explore the code more. Justin PS. A good paper on how hardware transactional memory could help us out: http://www-faculty.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf A few of you have probably read this already. Martin is even acknowledged, but it was news to me! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Jason Orendorff <[EMAIL PROTECTED]> wrote: > > On 9/13/07, Justin Tulloss <[EMAIL PROTECTED]> wrote: > > 1. Use message passing and transactions. [...] > > 2. Do it perl style. [...] > > 3. Come up with an elegant way of handling multiple python processes. > [...] > > 4. Remove the GIL, use transactions for python objects, [...] > > The SpiderMonkey JavaScript engine takes a very different approach, > described here: > http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety This is basically the same as what perl does, as far as I understand it. There are differences, but they're not that substantial. It's basically the idea of keeping all state separate and treating global access as a special case. I think this is a pretty solid approach, since globals shouldn't be accessed that often. What we would want to do differently is make sure that read-only globals can be cheaply accessed from any thread. Otherwise we lose the performance benefit of having them in the first place. Refcounting is another major issue. SpiderMonkey uses GC instead. > CPython would need to do atomic increfs/decrefs. (Deferred > refcounting could mitigate the cost.) This is definitely something to think about. I don't really have an answer straight off, but there are several things we could try. The main drawback (aside from the amount of work) is the patent. > SpiderMonkey's license grants a worldwide, royalty-free license, but > not under the Python license. I think this could be wrangled, if the > technical approach looks worthwhile. I'm not sure this is an issue. It's not like we would be using the code, just the patented algorithm. Any code we wrote to implement the algorithm would of course be covered under the python license. I'm not a legal guy though. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Adam Olsen <[EMAIL PROTECTED]> wrote: > > > Basically though, atomic incref/decref won't work. Once you've got > two threads modifying the same location the costs skyrocket. Even > without being properly atomic you'll get the same slowdown on x86 > (who's cache coherency is fairly strict.) I'm a bit skeptical of the actual costs of atomic incref. For there to be contention, you would need to have to be modifying the same memory location at the exact same time. That seems unlikely to ever happen. We can't bank on it never happening, but an occasionally expensive operation is ok. After all, it's occasional. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > > Jason Orendorff wrote: > > The clever bit is that SpiderMonkey's per-object > > locking does *not* require a context switch or even an atomic > > instruction, in the usual case where an object is *not* shared among > > threads. > > How does it tell whether an object is shared between > threads? That sounds like the really clever bit to me. If you look at the article, they have a code sample. Basically a global is "owned" by the first thread that touches it. That thread can do whatever it wants with that global. If another thread wants to touch the global, it locks everything to do so. This is a pretty good idea except that in Python there are so many globals that all threads benefit from having access to. Luckily, except for their reference counts, they're mostly read-only. Therefore, if we can work out this reference count, we can probably use a similar concept. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/14/07, Adam Olsen <[EMAIL PROTECTED]> wrote: > > Could be worth a try. A first step might be to just implement > > the atomic refcounting, and run that single-threaded to see > > if it has terribly bad effects on performance. > > I've done this experiment. It was about 12% on my box. Later, once I > had everything else setup so I could run two threads simultaneously, I > found much worse costs. All those literals become shared objects that > create contention. It's hard to argue with cold hard facts when all we have is raw speculation. What do you think of a model where there is a global "thread count" that keeps track of how many threads reference an object? Then there are thread-specific reference counters for each object. When a thread's refcount goes to 0, it decrefs the object's thread count. If you did this right, hopefully there would only be cache updates when you update the thread count, which will only be when a thread first references an object and when it last references an object. I mentioned this idea earlier and it's growing on me. Since you've actually messed around with the code, do you think this would alleviate some of the contention issues? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Your idea can be combined with the maxint/2 initial refcount for > non-disposable objects, which should about eliminate thread-count updates > for them. > -- > I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
I'm not sure I understand entirely what you're saying, but it sounds like you want multiple reference counts. A reference count per thread might not be a bad idea, but I can't think of how it would work without locks. If every object has an array of reference counts, then the GC would need to lock that array to check to see if they're all 0. That means the incref/decref operations would need to acquire this lock or risk messing up the GC. Perhaps you could have something where you have a reference count per thread and then a thread count per object. Then you would only need to lock the thread count for the first and last reference a thread makes to an object. Once there are no threads referencing and object, its obviously safe for cleanup. Of course, I'm not convinced atomic ops are really so expensive you can't have every thread doing it at once, but Adam says that the caches will be thrashed if we have a bunch of threads continuously updating the same memory address. I can see the possibility. Perhaps once we have a version that actually demonstrates this thrashing, we can alleviate it with some sort of multiple reference count scheme. Justin On 9/13/07, Tennessee Leeuwenburg <[EMAIL PROTECTED]> wrote: > > Pardon me for talking with no experience in such matters, but... > > Okay, incrementing a reference counter is atomic, therefore the cheapest > possible operation. Is it possible to keep reference counting atomic in a > multi-thread model? > > Could you do the following... let's consider two threads, "A" and "B". > Each time an object is created, a reference count is created in both "A" and > "B". Let's suppose "A" has a real reference and "B" has no reference really. > Couldn't the GC check two reference registers for a reference count? The > object would then be cleaned up only if both registers were 0. > > To exploit multiple CPUs, you could have two persistent Python processes > on each CPU with its own mini-GIL. Object creation would then involve a call > to each process to create the reference and GC would involve checking each > process to see what their count is. However, it would mean that within each > process, threads could create additional references or remove references in > an atomic way. > > In a single-CPU system, this would be the same cost as currently, since I > think that situation would devolve to having just one place to check for > references. This seems to mean that it is the case that it would be no more > expensive for a single-CPU system. > > In a two-CPU system, I'm no expertise on the actual call overheads of > object creation and garbage collection, but logically it would double the > effort of object creation and destruction (all such operations now need to > occur on both processes) but would keep reference increments and decrements > atomic. > > Once again, I'm really sorry if I'm completely off-base since I have never > done any actual coding in this area, but I thought I'd make the suggestion > just in case it happened to have relevance. > > Thanks, > -Tennessee > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tulloss2%40uiuc.edu > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures
On 9/18/07, Krishna Sankar <[EMAIL PROTECTED]> wrote: > > Folks, >As a follow-up to the py3k discussions started by Bruce and Guido, I > pinged Brett and he suggested I submit an exploratory proposal. Would > appreciate insights, wisdom, the good, the bad and the ugly. I am currently working on parallelizing python as an undergraduate independent study. I plan on first removing the GIL with as little overall effect as possible and then implementing a task-oriented threading API on top, probably based on Stackless (since they already do a great job with concurrency in a single thread). If you're interested in all the details, I'd be happy to share. I haven't gotten far yet (the semester just started!), but I feel that actually implementing these things would be the best way to get a PEP through. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] GC Changes
Hello, I've been doing some tests on removing the GIL, and it's becoming clear that some basic changes to the garbage collector may be needed in order for this to happen efficiently. Reference counting as it stands today is not very scalable. I've been looking into a few options, and I'm leaning towards the implementing IBMs recycler GC (http://www.research.ibm.com/people/d/dfb/recycler-publications.html ) since it is very similar to what is in place now from the users' perspective. However, I haven't been around the list long enough to really understand the feeling in the community on GC in the future of the interpreter. It seems that a full GC might have a lot of benefits in terms of performance and scalability, and I think that the current gc module is of the mark-and-sweep variety. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? On a more immediately relevant note, I'm not certain I understand the full extent of the gc module. From what I've read, it sounds like it's fairly close to a fully functional GC, yet it seems to exist only as a cycle-detecting backup to the reference counting mechanism. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter, or point me to a place where that is done? Why isn't the mark-and-sweep mechanism used for all memory management? Thanks a lot! Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
> The cyclic GC kicks in when memory is running low. When what memory is running low? Its default pool? System memory? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com