[Python-Dev] Python multithreading without the GIL

2021-10-07 Thread Sam Gross
Hi,

I've been working on changes to CPython to allow it to run without the
global interpreter lock. I'd like to share a working proof-of-concept that
can run without the GIL. The proof-of-concept involves substantial changes
to CPython internals, but relatively few changes to the C-API. It is
compatible with many C extensions: extensions must be rebuilt, but usually
require small or no modifications to source code. I've built compatible
versions of packages from the scientific Python ecosystem, and they are
installable through the bundled "pip".

Source code:
https://github.com/colesbury/nogil

Design overview:
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

My goal with the proof-of-concept is to demonstrate that removing the GIL
is feasible and worthwhile, and that the technical ideas of the project
could serve as a basis of such an effort.

I'd like to start a discussion about these ideas and gauge the community's
interest in this approach to removing the GIL.

Regards,
Sam Gross
colesb...@gmail.com / sgr...@fb.com
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Sam Gross
On Fri, Oct 8, 2021 at 12:55 PM Daniel Pope  wrote:

> I'm a novice C programmer, but I'm unsure about the safety of your
> thread-safe collections description.
>

The "list" class uses a slightly different strategy than "dict", which I
forgot about
when writing the design overview. List relies on the property that the
backing array
of a given list can only grow (never shrink) [1]. This is different from
upstream CPython.

Dict stores the capacity inside PyDictKeysObject (the backing array). The
capacity
never changes, so if you have a valid pointer to the PyDictKeysObject you
load the
correct capacity.

I've been meaning to change "list" to use the same strategy as "dict". I
think that would
simplify the overall design and let "list" shrink the backing array again.

[1]
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/Objects/listobject.c#L46-L49
(
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/46TGB2MXWJ37VUQH3R5LW6BOGLIE3PGG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Sam Gross
On Fri, Oct 8, 2021 at 12:24 PM Pablo Galindo Salgado 
wrote:

> When you mean "an order of magnitude less overhead than the current
> CPython implementation" do you mean compared with the main branch? We
> recently implemented already almost everything is listed in this paragraph.
>

I think I wrote that in August when "current CPython" meant something
different from today :) I'll update it.

Thanks for the links to the PRs. I'll need to look at them more closely,
but one I think one remaining difference is that
the "nogil" interpreter stays within the same interpreter loop for many
Python function calls, while upstream CPython
recursively calls into _PyEval_EvalFrameDefault.

I've been using this mini-benchmark to measure the overhead of Python
function calls for various numbers of
arguments and keywords:
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/benchmarks/call_benchmark.py

For zero, two, and four argument functions, I get:
nogil (nogil/fb6aabed): 10ns, 14ns, 18ns
3.11 (main/b108db63): 47ns, 54ns, 63ns
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UZKOY4Y3QWT76TCXJ3QXMEGRODN2DOGB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
On Fri, Oct 8, 2021 at 12:04 PM Nathaniel Smith  wrote:

> I notice the fb.com address -- is this a personal project or something
> facebook is working on? what's the relationship to Cinder, if any?
>

It is a Facebook project, at least in the important sense that I work on it
as an employee at Facebook. (I'm currently the only person working on it.)
I keep in touch with some of the Cinder devs regularly and they've advised
on the project, but otherwise the two projects are unrelated.


> Regarding the tricky lock-free dict/list reads: I guess the more
> straightforward approach would be to use a plain ol' mutex that's
> optimized for this kind of fine-grained per-object lock with short
> critical sections and minimal contention, like WTF::Lock. Did you try
> alternatives like that? If so, I assume they didn't work well -- can
> you give more details?
>

I'm using WTF::Lock style locks for dict/list mutations. I did an experiment
early on where I included locking around reads as well. I think it slowed
down
the pyperformance benchmarks by ~10% on average, but I can't find my notes
so I plan to re-run the experiment.

Additionally, because dicts are used for things like global variables, I'd
expect
that locks around reads prevent efficient scaling, but I haven't measured
this.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
On Fri, Oct 8, 2021 at 11:35 AM Chris Jerdonek 
wrote:

> Is it also slower even when running with PYTHONGIL=1? If it could be made
> the same speed for single-threaded code when running in GIL-enabled mode,
> that might be an easier intermediate target while still adding value.
>

Running with PYTHONGIL=1 is a bit less than 1% faster (on pyperformance)
than with PYTHONGIL=0. It might be possible to improve PYTHONGIL=1 by
another 1-2% by adding runtime checks for the GIL before attempting to lock
dicts and lists during mutations. I think further optimizations specific to
the PYTHONGIL=1 use case would be tricky.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6YLZMVKWI77SSNUV5XOGBSRY44KJ76UQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
>
> I’m unclear what is actually retried.  You use this note throughout the
> document, so I think it would help to clarify exactly what is retried and
> why that solves the particular problem.  I’m confused because, is it the
> refcount increment that’s retried or the entire sequence of steps (i.e. do
> you go back and reload the address of the item)?  Is there some kind of
> waiting period before the retry?  I would infer that if you’re retrying the
> refcount incrementing, it’s because you expect subsequent retries to
> transition from zero to non-zero, but is that guaranteed?  Are there
> possibilities of deadlocks or race conditions?


The entire operation is retried (not just the refcount). For "dict", this
means going back to step 1 and reloading the version tag and
PyDictKeysObject. The operation can fail (and need to be retried) only when
some other thread is concurrently modifying the dict. The reader needs to
perform the checks (and retry) to avoid returning inconsistent data, such
as an object that was never in the dict. With the checks and retry,
returning inconsistent or garbage data is not possible.

The retry is performed after locking the dict, so the operation is retried
at most once -- the read operation can't fail when it holds the dict's lock
because the lock prevents concurrent modifications. It would have also been
possible to retry the operation in a loop without locking the dict, but I
was concerned about reader starvation. (In the doc I wrote "livelock", but
"reader starvation" is more accurate.) In particular, I was concerned that
a thread repeatedly modifying a dict might prevent other threads reading
the dict from making progress. I hadn't seen this in practice, but I'm
aware that reader starvation can be an issue for similar designs like
Linux's seqlock. Acquiring the dict's lock when retrying avoids the reader
starvation issue.

Deadlock isn't possible because the code does not acquire any other
locks while holding the dict's lock. For example, the code releases the
dict's lock before calling Py_DECREF or PyObject_RichCompareBool.

The race condition question is a bit harder to answer precisely. Concurrent
reads and modifications of a dict won't cause the program to segfault,
return garbage data, or items that were never in the dict.

Regards,
Sam
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D37MQCDRXRVLDVZ65G5BJPJ6QEPSVLI4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
On Mon, Oct 11, 2021 at 12:58 PM Thomas Grainger  wrote:

> Is D1.update(D2) still atomic with this implementation?
> https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
>

No. For example, another thread reading from the dict concurrently may
observe a partial update.

As Ronald Oussoren points out, dict.update isn't atomic in the general
case. CPython even includes some checks for concurrent modifications:
https://github.com/python/cpython/blob/2f92e2a590f0e5d2d3093549f5af9a4a1889eb5a/Objects/dictobject.c#L2582-L2586
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IAOCDDCJ653NBED3G2J2YBWD7HHPFHT6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
On Mon, Oct 11, 2021 at 7:04 AM Antoine Pitrou  wrote:

> It's crude, but you can take a look at `ccbench` in the Tools directory.
>

Thanks, I wasn't familiar with this. The ccbench results look pretty good:
about 18.1x speed-up on "pi calculation" and 19.8x speed-up on "regular
expression" with 20 threads (turbo off). The latency and throughput results
look good too. With the GIL enabled (3.11), the compute intensive
background task increases latency and dramatically decreases throughput.
With the GIL disabled, latency remains low and throughput high.

Here are the full results for 20 threads without the GIL:
https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil,
20 threads)

Here are the results for 4 threads (the default) for comparison with
upstream:
https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil,
4 threads)
https://gist.github.com/colesbury/c0b89f82e51779670265fb7c7cd37114
(3.11/b108db63e0, 4 threads)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WCMIVNQ6DNOTZUUX4EX43LF2VJPF4ALW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross
I've updated the linked gists with the results from interpreters compiled
with PGO, so the numbers have slightly changed.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ESCEXN7HKL3GICHOHZMQTTHUDQN5WUYX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Sam Gross
Hi Skip,

I think the performance difference is because of different versions of
NumPy. Python 3.9 installs NumPy 1.21.3 by default for "pip install numpy".
I've only built and packaged NumPy 1.19.4 for "nogil" Python. There are
substantial performance differences between the two NumPy builds for this
matmul script.

With NumPy 1.19.4, I get practically the same results for both Python 3.9.2
and "nogil" Python for "time python3 matmul.py 0 10".

I'll update the version of NumPy for "nogil" Python if I have some time
this week.

Best,
Sam

On Sun, Oct 31, 2021 at 5:46 PM Skip Montanaro 
wrote:

> > Remember that py stone is a terrible benchmark.
>
> I understand that. I was only using it as a spot check. I was surprised at
> how much slower my (threaded or unthreaded) matrix multiply was on nogil vs
> 3.9+. I went into it thinking I would see an improvement. The Performance
> section of Sam's design document starts:
>
> As mentioned above, the no-GIL proof-of-concept interpreter is about 10%
> faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.
>
>
> so it didn't occur to me that I'd be looking at a slowdown, much less by
> as much as I'm seeing.
>
> Maybe I've somehow stumbled on some instruction mix for which the nogil VM
> is much worse than the stock VM. For now, I prefer to think I'm just doing
> something stupid. It certainly wouldn't be the first time.
>
> Skip
>
> P.S. I suppose I should have cc'd Sam when I first replied to this
> thread, but I'm doing so now. I figured my mistake would reveal itself
> early on. Sam, here's my first post about my little "project."
> https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W23EPICXG3RVOMMCVSM3FVOEN2U3LNM3/
Code of Conduct: http://python.org/psf/codeofconduct/