[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Christopher Barker
On Fri, Oct 29, 2021 at 6:10 AM Skip Montanaro 
wrote:

> 1. I use numpy arrays filled with random values, and the output array is
> also a numpy array. The vector multiplication is done in a simple for loop
> in my vecmul() function.
>

probably doesn't make a difference for this exercise, but numpy arrays make
lousy replacements for a  regular list -- i.e. as a container alone. The
issue is that floats need to be "boxed" and "unboxed" as you put them in
and pull them out of an array. whereas with lists, they float objects
themselves are already there.

OK, maybe not as bad as I remember. but not great:

In [61]: def multiply(vect, scalar, out):
...: """
...: multiply all the elements in vect by a scalar in place
...: """
...: for i, val in enumerate(vect):
...: out[i] = val * scalar
...:

In [62]: arr = np.random.random((10,))

In [63]: arrout = np.zeros_like(arr)

In [64]: l = list(arr)

In [65]: lout = [None] * len(l)

In [66]: %timeit multiply(arr, 1.1, arrout)
19.3 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [67]: %timeit multiply(l, 1.1, lout)
12.8 ms ± 83.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

> That said, I have now run my example code using both PYTHONGIL=0 and
PYTHONGIL=1 of Sam's nogil branch as well as the following other Python3
versions:

* Conda Python3 (3.9.7)
* /usr/bin/python3 (3.9.1 in my case)
* 3.9 branch tip (3.9.7+)

The results were confusing, so I dredged up a copy of pystone to make sure
I wasn't missing anything w.r.t. basic execution performance. I'm still
confused, so will keep digging.

I'll be interested to see what you find out :-)

It would also be fun to see David Beezley’s example from his seminal talk:
>
> https://youtu.be/ph374fJqFPE
>

Thanks, I'll take a look when I get a chance

That may not be the best source of the talk -- just the one I found first
:-)

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GY7RWKFOPQFGTGD7IUN5JS6FYNXYM22I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Skip Montanaro
Skip> 1. I use numpy arrays filled with random values, and the output array
is also a numpy array. The vector multiplication is done in a simple for
loop in my vecmul() function.

CHB> probably doesn't make a difference for this exercise, but numpy arrays
make lousy replacements for a  regular list ...

Yeah, I don't think it should matter here. Both versions should be
similarly penalized.

Skip> The results were confusing, so I dredged up a copy of pystone to make
sure I wasn't missing anything w.r.t. basic execution performance. I'm
still confused, so will keep digging.

CHB> I'll be interested to see what you find out :-)

I'm still scratching my head. I was thinking there was something about the
messaging between the main and worker threads, so I tweaked matmul.py to
accept 0 as a number of threads. That means it would call matmul which
would call vecmul directly. The original queue-using versions were simply
renamed to matmul_t and vecmul_t.

I am still confused. Here are the pystone numbers, nogil first, then the
3.9 git tip:

(base) nogil_build% ./bin/python3 ~/cmd/pystone.py
Pystone(1.1.1) time for 5 passes = 0.137658
This machine benchmarks at 363218 pystones/second

(base) 3.9_build% ./bin/python3 ~/cmd/pystone.py
Pystone(1.1.1) time for 5 passes = 0.207102
This machine benchmarks at 241427 pystones/second

That suggests nogil is indeed a definite improvement over vanilla 3.9.
However, here's a quick nogil v 3.9 timing run of my matrix multiplication,
again, nogil followed by 3.9 tip:

(base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m9.314s
user 0m9.302s
sys 0m0.012s

(base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m4.918s
user 0m5.180s
sys 0m0.380s

What's up with that? Suddenly nogil is much slower than 3.9 tip. No threads
are in use. I thought perhaps the nogil run somehow didn't use Sam's VM
improvements, so I disassembled the two versions of vecmul. I won't bore
you with the entire dis.dis output, but suffice it to say that Sam's
instruction set appears to be in play:

(base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3
Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matmul, dis
>>> dis.dis(matmul.vecmul)
 26   0 FUNC_HEADER 11 (11)

 28   2 LOAD_CONST   2 (0.0)
  4 STORE_FAST   2 (result)

 29   6 LOAD_GLOBAL  3 254 ('len'; 254)
  9 STORE_FAST   8 (.t3)
 11 COPY   9 0 (.t4 <- a)
 14 CALL_FUNCTION  9 1 (.t4 to .t5)
 18 STORE_FAST   5 (.t0)
...

So I unboxed the two numpy arrays once and used lists of lists for the
actual work. The nogil version still performs worse by about a factor of
two:

(base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m9.537s
user 0m9.525s
sys 0m0.012s

(base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m4.836s
user 0m5.109s
sys 0m0.365s

Still scratching my head and am open to suggestions about what to try next.
If anyone is playing along from home, I've updated my script:

https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d

I'm sure there are things I could have done more efficiently, but I would
think both Python versions would be similarly penalized by dumb s**t I've
done.

Skip


Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Guido van Rossum
Remember that py stone is a terrible benchmark. It only exercises a few
byte codes and a modern CPU’s caching and branch prediction make minced
meat of those. Sam wrote a whole new register-based VM so perhaps that
exercises different byte codes.

On Sun, Oct 31, 2021 at 05:19 Skip Montanaro 
wrote:

> Skip> 1. I use numpy arrays filled with random values, and the output
> array is also a numpy array. The vector multiplication is done in a simple
> for loop in my vecmul() function.
>
> CHB> probably doesn't make a difference for this exercise, but numpy
> arrays make lousy replacements for a  regular list ...
>
> Yeah, I don't think it should matter here. Both versions should be
> similarly penalized.
>
> Skip> The results were confusing, so I dredged up a copy of pystone to
> make sure I wasn't missing anything w.r.t. basic execution performance. I'm
> still confused, so will keep digging.
>
> CHB> I'll be interested to see what you find out :-)
>
> I'm still scratching my head. I was thinking there was something about the
> messaging between the main and worker threads, so I tweaked matmul.py to
> accept 0 as a number of threads. That means it would call matmul which
> would call vecmul directly. The original queue-using versions were simply
> renamed to matmul_t and vecmul_t.
>
> I am still confused. Here are the pystone numbers, nogil first, then the
> 3.9 git tip:
>
> (base) nogil_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 5 passes = 0.137658
> This machine benchmarks at 363218 pystones/second
>
> (base) 3.9_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 5 passes = 0.207102
> This machine benchmarks at 241427 pystones/second
>
> That suggests nogil is indeed a definite improvement over vanilla 3.9.
> However, here's a quick nogil v 3.9 timing run of my matrix multiplication,
> again, nogil followed by 3.9 tip:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.314s
> user 0m9.302s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.918s
> user 0m5.180s
> sys 0m0.380s
>
> What's up with that? Suddenly nogil is much slower than 3.9 tip. No
> threads are in use. I thought perhaps the nogil run somehow didn't use
> Sam's VM improvements, so I disassembled the two versions of vecmul. I
> won't bore you with the entire dis.dis output, but suffice it to say that
> Sam's instruction set appears to be in play:
>
> (base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3
> Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import matmul, dis
> >>> dis.dis(matmul.vecmul)
>  26   0 FUNC_HEADER 11 (11)
>
>  28   2 LOAD_CONST   2 (0.0)
>   4 STORE_FAST   2 (result)
>
>  29   6 LOAD_GLOBAL  3 254 ('len'; 254)
>   9 STORE_FAST   8 (.t3)
>  11 COPY   9 0 (.t4 <- a)
>  14 CALL_FUNCTION  9 1 (.t4 to .t5)
>  18 STORE_FAST   5 (.t0)
> ...
>
> So I unboxed the two numpy arrays once and used lists of lists for the
> actual work. The nogil version still performs worse by about a factor of
> two:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.537s
> user 0m9.525s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.836s
> user 0m5.109s
> sys 0m0.365s
>
> Still scratching my head and am open to suggestions about what to try
> next. If anyone is playing along from home, I've updated my script:
>
> https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d
>
> I'm sure there are things I could have done more efficiently, but I would
> think both Python versions would be similarly penalized by dumb s**t I've
> done.
>
> Skip
>
>
> Skip
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSLCURZJD5NLAYN5LFEZ4RJWU5YPQX65/
Code of Conduct: h

[Python-Dev] Re: Type annotations, PEP 649 and PEP 563

2021-10-31 Thread asafspades
One use case which seems significant to me but I don’t think has been 
explicitly mentioned is annotations using a package with stubs where the 
stubbed typing API is slightly different than the runtime API.
For example sometimes additional tape aliases are defined for convenience in 
stubs without a corresponding runtime name, or a class is generic in the stubs 
but has not yet been made to inherit typing.Generic (or just made a 
subscriptable class), in runtime. These situations are described in the mypy 
docs: 
https://mypy.readthedocs.io/en/latest/runtime_troubles.html#using-classes-that-are-generic-in-stubs-but-not-at-runtime.
These are easy to write without problems using PEP 563, but I am not sure how 
they would work with PEP 649.

I believe this pattern may be useful in complex existing libraries when typing 
is added, as it may be difficult to convert an existing class to generic. For 
example with numpy, the core ndarray class was made generic in stubs to support 
indicating the shape and data type. You could only write eg ndarray[Any, 
np.int64] using PEP 563. A while later a workaround was added by defining 
__class_getitem__.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IJ37JVIVVGN5BJGMMNKGADXBYUSMVU6F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Skip Montanaro
> Remember that py stone is a terrible benchmark.

I understand that. I was only using it as a spot check. I was surprised at
how much slower my (threaded or unthreaded) matrix multiply was on nogil vs
3.9+. I went into it thinking I would see an improvement. The Performance
section of Sam's design document starts:

As mentioned above, the no-GIL proof-of-concept interpreter is about 10%
faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.


so it didn't occur to me that I'd be looking at a slowdown, much less by as
much as I'm seeing.

Maybe I've somehow stumbled on some instruction mix for which the nogil VM
is much worse than the stock VM. For now, I prefer to think I'm just doing
something stupid. It certainly wouldn't be the first time.

Skip

P.S. I suppose I should have cc'd Sam when I first replied to this
thread, but I'm doing so now. I figured my mistake would reveal itself
early on. Sam, here's my first post about my little "project."
https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CGT4EMEA7JEH6CIRTB7Z5UUIKWKREAMF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] September Steering Council update

2021-10-31 Thread Pablo Galindo Salgado
I’ve just published the September steering council update, also included
below:

https://github.com/python/steering-council/blob/main/updates/2021-09-steering-council-update.md

Just as a reminder, if you have any questions or concerns, feel free to
contact us or open an issue in the SC repo:
https://github.com/python/steering-council

*September 6*

- There was no SC meeting on September 6 as it was a USA/Canada holiday
(Labor Day).

*September 13*

- The Steering Council met with Łukasz, the Developer-in-Residence. The
group
  discussed dealing with review requests and folks not giving Łukasz enough
  time to review before things are merged.
- The group briefly discussed Ezio's progress as the PM for the GitHub
Issues
  migration and that the group would meet with Ezio on the 20th of Sept.
- The SC discussed the Exception Groups PEP & Nathaniel's counter-proposal.
The
  group decided that more time was needed so they will discuss this more at
  their Sept 20th meeting.

*September 20*

- The Steering Council met with Ezio and got an update on his progress with
the
  migration. The group and Ezio agreed that by Oct 1 the plan is to have a
test
  repo with a subset of issues in it for a small group to test and provide
  feedback on.
- The Steering Council discussed [PEP 654](
https://www.python.org/dev/peps/pep-0654/)(
  Exception Groups and except*) and after some extensive deliberation, the
  group decided to accept.
- Thomas sent out the notification.
- The Steering Council discussed [PEP 649](
https://www.python.org/dev/peps/pep-0649/)(
  Deferred Evaluation Of Annotations Using Descriptors) by Larry Hastings.
The
  group decided that we have to tie the typing language to the Python
language.
  The group discussed the potential of an informational PEP from the SC.
- Pablo informed the SC that there will be a release party on Twitch for
3.10
  co-organized with the people from the Python discord server.

*September 27*

- Steering Council met with the Developer-in-Residence for their every-
other-
  week check-in. The group discussed what Łukasz is working on, the status
of
  typing PEPs and CPython survey questions for the Python Developer Survey.
- The SC discussed [PEP 649](
https://www.python.org/dev/peps/pep-0649/)(Deferred
  Evaluation Of Annotations Using Descriptors) by Larry Hastings and
decided a
  broader discussion needed to happen with python-dev. The group is
inclined to
  accept 649 but there is no clear path on how to handle the transition so
community
  discussion is needed.

Regards from rainy London,
Pablo Galindo Salgado
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3CSYIEDW3Y6U24Z4C4CSTVRUKNYUWMS4/
Code of Conduct: http://python.org/psf/codeofconduct/