[Python-Dev] Re: Python multithreading without the GIL
On Fri, Oct 29, 2021 at 6:10 AM Skip Montanaro wrote: > 1. I use numpy arrays filled with random values, and the output array is > also a numpy array. The vector multiplication is done in a simple for loop > in my vecmul() function. > probably doesn't make a difference for this exercise, but numpy arrays make lousy replacements for a regular list -- i.e. as a container alone. The issue is that floats need to be "boxed" and "unboxed" as you put them in and pull them out of an array. whereas with lists, they float objects themselves are already there. OK, maybe not as bad as I remember. but not great: In [61]: def multiply(vect, scalar, out): ...: """ ...: multiply all the elements in vect by a scalar in place ...: """ ...: for i, val in enumerate(vect): ...: out[i] = val * scalar ...: In [62]: arr = np.random.random((10,)) In [63]: arrout = np.zeros_like(arr) In [64]: l = list(arr) In [65]: lout = [None] * len(l) In [66]: %timeit multiply(arr, 1.1, arrout) 19.3 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [67]: %timeit multiply(l, 1.1, lout) 12.8 ms ± 83.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) > That said, I have now run my example code using both PYTHONGIL=0 and PYTHONGIL=1 of Sam's nogil branch as well as the following other Python3 versions: * Conda Python3 (3.9.7) * /usr/bin/python3 (3.9.1 in my case) * 3.9 branch tip (3.9.7+) The results were confusing, so I dredged up a copy of pystone to make sure I wasn't missing anything w.r.t. basic execution performance. I'm still confused, so will keep digging. I'll be interested to see what you find out :-) It would also be fun to see David Beezley’s example from his seminal talk: > > https://youtu.be/ph374fJqFPE > Thanks, I'll take a look when I get a chance That may not be the best source of the talk -- just the one I found first :-) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GY7RWKFOPQFGTGD7IUN5JS6FYNXYM22I/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
Skip> 1. I use numpy arrays filled with random values, and the output array is also a numpy array. The vector multiplication is done in a simple for loop in my vecmul() function. CHB> probably doesn't make a difference for this exercise, but numpy arrays make lousy replacements for a regular list ... Yeah, I don't think it should matter here. Both versions should be similarly penalized. Skip> The results were confusing, so I dredged up a copy of pystone to make sure I wasn't missing anything w.r.t. basic execution performance. I'm still confused, so will keep digging. CHB> I'll be interested to see what you find out :-) I'm still scratching my head. I was thinking there was something about the messaging between the main and worker threads, so I tweaked matmul.py to accept 0 as a number of threads. That means it would call matmul which would call vecmul directly. The original queue-using versions were simply renamed to matmul_t and vecmul_t. I am still confused. Here are the pystone numbers, nogil first, then the 3.9 git tip: (base) nogil_build% ./bin/python3 ~/cmd/pystone.py Pystone(1.1.1) time for 5 passes = 0.137658 This machine benchmarks at 363218 pystones/second (base) 3.9_build% ./bin/python3 ~/cmd/pystone.py Pystone(1.1.1) time for 5 passes = 0.207102 This machine benchmarks at 241427 pystones/second That suggests nogil is indeed a definite improvement over vanilla 3.9. However, here's a quick nogil v 3.9 timing run of my matrix multiplication, again, nogil followed by 3.9 tip: (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10 a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 real 0m9.314s user 0m9.302s sys 0m0.012s (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10 a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 real 0m4.918s user 0m5.180s sys 0m0.380s What's up with that? Suddenly nogil is much slower than 3.9 tip. No threads are in use. I thought perhaps the nogil run somehow didn't use Sam's VM improvements, so I disassembled the two versions of vecmul. I won't bore you with the entire dis.dis output, but suffice it to say that Sam's instruction set appears to be in play: (base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3 Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import matmul, dis >>> dis.dis(matmul.vecmul) 26 0 FUNC_HEADER 11 (11) 28 2 LOAD_CONST 2 (0.0) 4 STORE_FAST 2 (result) 29 6 LOAD_GLOBAL 3 254 ('len'; 254) 9 STORE_FAST 8 (.t3) 11 COPY 9 0 (.t4 <- a) 14 CALL_FUNCTION 9 1 (.t4 to .t5) 18 STORE_FAST 5 (.t0) ... So I unboxed the two numpy arrays once and used lists of lists for the actual work. The nogil version still performs worse by about a factor of two: (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10 a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 real 0m9.537s user 0m9.525s sys 0m0.012s (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10 a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 real 0m4.836s user 0m5.109s sys 0m0.365s Still scratching my head and am open to suggestions about what to try next. If anyone is playing along from home, I've updated my script: https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d I'm sure there are things I could have done more efficiently, but I would think both Python versions would be similarly penalized by dumb s**t I've done. Skip Skip ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
Remember that py stone is a terrible benchmark. It only exercises a few byte codes and a modern CPU’s caching and branch prediction make minced meat of those. Sam wrote a whole new register-based VM so perhaps that exercises different byte codes. On Sun, Oct 31, 2021 at 05:19 Skip Montanaro wrote: > Skip> 1. I use numpy arrays filled with random values, and the output > array is also a numpy array. The vector multiplication is done in a simple > for loop in my vecmul() function. > > CHB> probably doesn't make a difference for this exercise, but numpy > arrays make lousy replacements for a regular list ... > > Yeah, I don't think it should matter here. Both versions should be > similarly penalized. > > Skip> The results were confusing, so I dredged up a copy of pystone to > make sure I wasn't missing anything w.r.t. basic execution performance. I'm > still confused, so will keep digging. > > CHB> I'll be interested to see what you find out :-) > > I'm still scratching my head. I was thinking there was something about the > messaging between the main and worker threads, so I tweaked matmul.py to > accept 0 as a number of threads. That means it would call matmul which > would call vecmul directly. The original queue-using versions were simply > renamed to matmul_t and vecmul_t. > > I am still confused. Here are the pystone numbers, nogil first, then the > 3.9 git tip: > > (base) nogil_build% ./bin/python3 ~/cmd/pystone.py > Pystone(1.1.1) time for 5 passes = 0.137658 > This machine benchmarks at 363218 pystones/second > > (base) 3.9_build% ./bin/python3 ~/cmd/pystone.py > Pystone(1.1.1) time for 5 passes = 0.207102 > This machine benchmarks at 241427 pystones/second > > That suggests nogil is indeed a definite improvement over vanilla 3.9. > However, here's a quick nogil v 3.9 timing run of my matrix multiplication, > again, nogil followed by 3.9 tip: > > (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10 > a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 > > real 0m9.314s > user 0m9.302s > sys 0m0.012s > > (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10 > a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 > > real 0m4.918s > user 0m5.180s > sys 0m0.380s > > What's up with that? Suddenly nogil is much slower than 3.9 tip. No > threads are in use. I thought perhaps the nogil run somehow didn't use > Sam's VM improvements, so I disassembled the two versions of vecmul. I > won't bore you with the entire dis.dis output, but suffice it to say that > Sam's instruction set appears to be in play: > > (base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3 > Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03) > [GCC 9.3.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import matmul, dis > >>> dis.dis(matmul.vecmul) > 26 0 FUNC_HEADER 11 (11) > > 28 2 LOAD_CONST 2 (0.0) > 4 STORE_FAST 2 (result) > > 29 6 LOAD_GLOBAL 3 254 ('len'; 254) > 9 STORE_FAST 8 (.t3) > 11 COPY 9 0 (.t4 <- a) > 14 CALL_FUNCTION 9 1 (.t4 to .t5) > 18 STORE_FAST 5 (.t0) > ... > > So I unboxed the two numpy arrays once and used lists of lists for the > actual work. The nogil version still performs worse by about a factor of > two: > > (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10 > a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 > > real 0m9.537s > user 0m9.525s > sys 0m0.012s > > (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10 > a: (160, 625) b: (625, 320) result: (160, 320) -> 51200 > > real 0m4.836s > user 0m5.109s > sys 0m0.365s > > Still scratching my head and am open to suggestions about what to try > next. If anyone is playing along from home, I've updated my script: > > https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d > > I'm sure there are things I could have done more efficiently, but I would > think both Python versions would be similarly penalized by dumb s**t I've > done. > > Skip > > > Skip > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido (mobile) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SSLCURZJD5NLAYN5LFEZ4RJWU5YPQX65/ Code of Conduct: h
[Python-Dev] Re: Type annotations, PEP 649 and PEP 563
One use case which seems significant to me but I don’t think has been explicitly mentioned is annotations using a package with stubs where the stubbed typing API is slightly different than the runtime API. For example sometimes additional tape aliases are defined for convenience in stubs without a corresponding runtime name, or a class is generic in the stubs but has not yet been made to inherit typing.Generic (or just made a subscriptable class), in runtime. These situations are described in the mypy docs: https://mypy.readthedocs.io/en/latest/runtime_troubles.html#using-classes-that-are-generic-in-stubs-but-not-at-runtime. These are easy to write without problems using PEP 563, but I am not sure how they would work with PEP 649. I believe this pattern may be useful in complex existing libraries when typing is added, as it may be difficult to convert an existing class to generic. For example with numpy, the core ndarray class was made generic in stubs to support indicating the shape and data type. You could only write eg ndarray[Any, np.int64] using PEP 563. A while later a workaround was added by defining __class_getitem__. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IJ37JVIVVGN5BJGMMNKGADXBYUSMVU6F/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python multithreading without the GIL
> Remember that py stone is a terrible benchmark. I understand that. I was only using it as a spot check. I was surprised at how much slower my (threaded or unthreaded) matrix multiply was on nogil vs 3.9+. I went into it thinking I would see an improvement. The Performance section of Sam's design document starts: As mentioned above, the no-GIL proof-of-concept interpreter is about 10% faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite. so it didn't occur to me that I'd be looking at a slowdown, much less by as much as I'm seeing. Maybe I've somehow stumbled on some instruction mix for which the nogil VM is much worse than the stock VM. For now, I prefer to think I'm just doing something stupid. It certainly wouldn't be the first time. Skip P.S. I suppose I should have cc'd Sam when I first replied to this thread, but I'm doing so now. I figured my mistake would reveal itself early on. Sam, here's my first post about my little "project." https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CGT4EMEA7JEH6CIRTB7Z5UUIKWKREAMF/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] September Steering Council update
I’ve just published the September steering council update, also included below: https://github.com/python/steering-council/blob/main/updates/2021-09-steering-council-update.md Just as a reminder, if you have any questions or concerns, feel free to contact us or open an issue in the SC repo: https://github.com/python/steering-council *September 6* - There was no SC meeting on September 6 as it was a USA/Canada holiday (Labor Day). *September 13* - The Steering Council met with Łukasz, the Developer-in-Residence. The group discussed dealing with review requests and folks not giving Łukasz enough time to review before things are merged. - The group briefly discussed Ezio's progress as the PM for the GitHub Issues migration and that the group would meet with Ezio on the 20th of Sept. - The SC discussed the Exception Groups PEP & Nathaniel's counter-proposal. The group decided that more time was needed so they will discuss this more at their Sept 20th meeting. *September 20* - The Steering Council met with Ezio and got an update on his progress with the migration. The group and Ezio agreed that by Oct 1 the plan is to have a test repo with a subset of issues in it for a small group to test and provide feedback on. - The Steering Council discussed [PEP 654]( https://www.python.org/dev/peps/pep-0654/)( Exception Groups and except*) and after some extensive deliberation, the group decided to accept. - Thomas sent out the notification. - The Steering Council discussed [PEP 649]( https://www.python.org/dev/peps/pep-0649/)( Deferred Evaluation Of Annotations Using Descriptors) by Larry Hastings. The group decided that we have to tie the typing language to the Python language. The group discussed the potential of an informational PEP from the SC. - Pablo informed the SC that there will be a release party on Twitch for 3.10 co-organized with the people from the Python discord server. *September 27* - Steering Council met with the Developer-in-Residence for their every- other- week check-in. The group discussed what Łukasz is working on, the status of typing PEPs and CPython survey questions for the Python Developer Survey. - The SC discussed [PEP 649]( https://www.python.org/dev/peps/pep-0649/)(Deferred Evaluation Of Annotations Using Descriptors) by Larry Hastings and decided a broader discussion needed to happen with python-dev. The group is inclined to accept 649 but there is no clear path on how to handle the transition so community discussion is needed. Regards from rainy London, Pablo Galindo Salgado ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3CSYIEDW3Y6U24Z4C4CSTVRUKNYUWMS4/ Code of Conduct: http://python.org/psf/codeofconduct/