[Python-Dev] Python multithreading without the GIL
Hi, I've been working on changes to CPython to allow it to run without the global interpreter lock. I'd like to share a working proof-of-concept that can run without the GIL. The proof-of-concept involves substantial changes to CPython internals, but relatively few changes to the C-API. It is compatible with many C extensions: extensions must be rebuilt, but usually require small or no modifications to source code. I've built compatible versions of packages from the scientific Python ecosystem, and they are installable through the bundled "pip". Source code: https://github.com/colesbury/nogil Design overview: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit My goal with the proof-of-concept is to demonstrate that removing the GIL is feasible and worthwhile, and that the technical ideas of the project could serve as a basis of such an effort. I'd like to start a discussion about these ideas and gauge the community's interest in this approach to removing the GIL. Regards, Sam Gross colesb...@gmail.com / sgr...@fb.com ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Fri, Oct 8, 2021 at 12:55 PM Daniel Pope wrote: > I'm a novice C programmer, but I'm unsure about the safety of your > thread-safe collections description. > The "list" class uses a slightly different strategy than "dict", which I forgot about when writing the design overview. List relies on the property that the backing array of a given list can only grow (never shrink) [1]. This is different from upstream CPython. Dict stores the capacity inside PyDictKeysObject (the backing array). The capacity never changes, so if you have a valid pointer to the PyDictKeysObject you load the correct capacity. I've been meaning to change "list" to use the same strategy as "dict". I think that would simplify the overall design and let "list" shrink the backing array again. [1] https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/Objects/listobject.c#L46-L49 ( ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/46TGB2MXWJ37VUQH3R5LW6BOGLIE3PGG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Fri, Oct 8, 2021 at 12:24 PM Pablo Galindo Salgado wrote: > When you mean "an order of magnitude less overhead than the current > CPython implementation" do you mean compared with the main branch? We > recently implemented already almost everything is listed in this paragraph. > I think I wrote that in August when "current CPython" meant something different from today :) I'll update it. Thanks for the links to the PRs. I'll need to look at them more closely, but one I think one remaining difference is that the "nogil" interpreter stays within the same interpreter loop for many Python function calls, while upstream CPython recursively calls into _PyEval_EvalFrameDefault. I've been using this mini-benchmark to measure the overhead of Python function calls for various numbers of arguments and keywords: https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/benchmarks/call_benchmark.py For zero, two, and four argument functions, I get: nogil (nogil/fb6aabed): 10ns, 14ns, 18ns 3.11 (main/b108db63): 47ns, 54ns, 63ns ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UZKOY4Y3QWT76TCXJ3QXMEGRODN2DOGB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Fri, Oct 8, 2021 at 12:04 PM Nathaniel Smith wrote: > I notice the fb.com address -- is this a personal project or something > facebook is working on? what's the relationship to Cinder, if any? > It is a Facebook project, at least in the important sense that I work on it as an employee at Facebook. (I'm currently the only person working on it.) I keep in touch with some of the Cinder devs regularly and they've advised on the project, but otherwise the two projects are unrelated. > Regarding the tricky lock-free dict/list reads: I guess the more > straightforward approach would be to use a plain ol' mutex that's > optimized for this kind of fine-grained per-object lock with short > critical sections and minimal contention, like WTF::Lock. Did you try > alternatives like that? If so, I assume they didn't work well -- can > you give more details? > I'm using WTF::Lock style locks for dict/list mutations. I did an experiment early on where I included locking around reads as well. I think it slowed down the pyperformance benchmarks by ~10% on average, but I can't find my notes so I plan to re-run the experiment. Additionally, because dicts are used for things like global variables, I'd expect that locks around reads prevent efficient scaling, but I haven't measured this. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Fri, Oct 8, 2021 at 11:35 AM Chris Jerdonek wrote: > Is it also slower even when running with PYTHONGIL=1? If it could be made > the same speed for single-threaded code when running in GIL-enabled mode, > that might be an easier intermediate target while still adding value. > Running with PYTHONGIL=1 is a bit less than 1% faster (on pyperformance) than with PYTHONGIL=0. It might be possible to improve PYTHONGIL=1 by another 1-2% by adding runtime checks for the GIL before attempting to lock dicts and lists during mutations. I think further optimizations specific to the PYTHONGIL=1 use case would be tricky. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6YLZMVKWI77SSNUV5XOGBSRY44KJ76UQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
> > I’m unclear what is actually retried. You use this note throughout the > document, so I think it would help to clarify exactly what is retried and > why that solves the particular problem. I’m confused because, is it the > refcount increment that’s retried or the entire sequence of steps (i.e. do > you go back and reload the address of the item)? Is there some kind of > waiting period before the retry? I would infer that if you’re retrying the > refcount incrementing, it’s because you expect subsequent retries to > transition from zero to non-zero, but is that guaranteed? Are there > possibilities of deadlocks or race conditions? The entire operation is retried (not just the refcount). For "dict", this means going back to step 1 and reloading the version tag and PyDictKeysObject. The operation can fail (and need to be retried) only when some other thread is concurrently modifying the dict. The reader needs to perform the checks (and retry) to avoid returning inconsistent data, such as an object that was never in the dict. With the checks and retry, returning inconsistent or garbage data is not possible. The retry is performed after locking the dict, so the operation is retried at most once -- the read operation can't fail when it holds the dict's lock because the lock prevents concurrent modifications. It would have also been possible to retry the operation in a loop without locking the dict, but I was concerned about reader starvation. (In the doc I wrote "livelock", but "reader starvation" is more accurate.) In particular, I was concerned that a thread repeatedly modifying a dict might prevent other threads reading the dict from making progress. I hadn't seen this in practice, but I'm aware that reader starvation can be an issue for similar designs like Linux's seqlock. Acquiring the dict's lock when retrying avoids the reader starvation issue. Deadlock isn't possible because the code does not acquire any other locks while holding the dict's lock. For example, the code releases the dict's lock before calling Py_DECREF or PyObject_RichCompareBool. The race condition question is a bit harder to answer precisely. Concurrent reads and modifications of a dict won't cause the program to segfault, return garbage data, or items that were never in the dict. Regards, Sam ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/D37MQCDRXRVLDVZ65G5BJPJ6QEPSVLI4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Mon, Oct 11, 2021 at 12:58 PM Thomas Grainger wrote: > Is D1.update(D2) still atomic with this implementation? > https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe > No. For example, another thread reading from the dict concurrently may observe a partial update. As Ronald Oussoren points out, dict.update isn't atomic in the general case. CPython even includes some checks for concurrent modifications: https://github.com/python/cpython/blob/2f92e2a590f0e5d2d3093549f5af9a4a1889eb5a/Objects/dictobject.c#L2582-L2586 ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IAOCDDCJ653NBED3G2J2YBWD7HHPFHT6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
On Mon, Oct 11, 2021 at 7:04 AM Antoine Pitrou wrote: > It's crude, but you can take a look at `ccbench` in the Tools directory. > Thanks, I wasn't familiar with this. The ccbench results look pretty good: about 18.1x speed-up on "pi calculation" and 19.8x speed-up on "regular expression" with 20 threads (turbo off). The latency and throughput results look good too. With the GIL enabled (3.11), the compute intensive background task increases latency and dramatically decreases throughput. With the GIL disabled, latency remains low and throughput high. Here are the full results for 20 threads without the GIL: https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil, 20 threads) Here are the results for 4 threads (the default) for comparison with upstream: https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil, 4 threads) https://gist.github.com/colesbury/c0b89f82e51779670265fb7c7cd37114 (3.11/b108db63e0, 4 threads) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WCMIVNQ6DNOTZUUX4EX43LF2VJPF4ALW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
I've updated the linked gists with the results from interpreters compiled with PGO, so the numbers have slightly changed. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ESCEXN7HKL3GICHOHZMQTTHUDQN5WUYX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
Hi Skip, I think the performance difference is because of different versions of NumPy. Python 3.9 installs NumPy 1.21.3 by default for "pip install numpy". I've only built and packaged NumPy 1.19.4 for "nogil" Python. There are substantial performance differences between the two NumPy builds for this matmul script. With NumPy 1.19.4, I get practically the same results for both Python 3.9.2 and "nogil" Python for "time python3 matmul.py 0 10". I'll update the version of NumPy for "nogil" Python if I have some time this week. Best, Sam On Sun, Oct 31, 2021 at 5:46 PM Skip Montanaro wrote: > > Remember that py stone is a terrible benchmark. > > I understand that. I was only using it as a spot check. I was surprised at > how much slower my (threaded or unthreaded) matrix multiply was on nogil vs > 3.9+. I went into it thinking I would see an improvement. The Performance > section of Sam's design document starts: > > As mentioned above, the no-GIL proof-of-concept interpreter is about 10% > faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite. > > > so it didn't occur to me that I'd be looking at a slowdown, much less by > as much as I'm seeing. > > Maybe I've somehow stumbled on some instruction mix for which the nogil VM > is much worse than the stock VM. For now, I prefer to think I'm just doing > something stupid. It certainly wouldn't be the first time. > > Skip > > P.S. I suppose I should have cc'd Sam when I first replied to this > thread, but I'm doing so now. I figured my mistake would reveal itself > early on. Sam, here's my first post about my little "project." > https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/ > > > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/W23EPICXG3RVOMMCVSM3FVOEN2U3LNM3/ Code of Conduct: http://python.org/psf/codeofconduct/