[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
To eliminate the possibility of being affected by the different versions of numpy I have just now upgraded numpy in Python 3.8 environment to the latest version, so both 3.8 and 3.10 and using numpy 1.21.4 and still the timing is exactly the same. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/THPN4OWM3A335LDO7HVIQSIDFFVO5URZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
Alas, it is exactly the same as previously reported, so the problem persists. If it was exactly the same between Python versions I would celebrate and shout for joy, seeing that the problem is narrowed down to numpy. I can carefully upgrade all the other packages in 3.8 to match those in 3.10. As I can downgrade (I will test it first), I should be able to restore my "superfast 3.8 environment", should this upgrade break it. I will report what I discover. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZM7UU6CVMIWEJEXB7V57N4FML2A7RLQ3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
In both cases I installed numpy using "sudo -H pip install numpy". And just now I upgraded numpy in 3.8 using "sudo -H pip3.8 install --upgrade numpy". I will try to simplify the program by removing all the higher level complexity and see what I find. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SPI6K4LNO5BFLIUGYBHCMYCXX7FO7YV5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
I think I have found something very interesting. Namely, I removed all multiprocessing (which is done in the shell script, not in Python) and so reduced the program to just a single thread of execution. And lo and behold, Python 3.10 now consistently beats 3.8 by about 5%. However, this is not the END! Namely, it is very important to find out why when running multiple processes simultaneously 3.8 still outperforms 3.10. The thing is -- all these different threads write to completely unrelated data files (.npz and .npy) The only thing they all have in common is the initial data, which they all read from the same 'init.npz' and 'init_W.npy' files using: with load(args.ifilename + '.npz', allow_pickle=True) as data: and Winit = memmap(iWfilename, dtype='float64', mode='r', shape=(Nt, Nx, Np)) So, could this be the problem? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SMTEEMBDUJ7ZYM6HYOOZXT6NOHJFJIYY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
I have created four different sets of initial data, one for each thread of execution and no, unfortunately, that does NOT solve the problem. Still, when four threads are executed in parallel, 3.8 outperforms 3.10 by a factor of 2.4. So, there is some other point of contention between the threads, which I need to find... ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/35QRBPQFN4MOCSADYB4HSTJQXZ2QTSKT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
So far I have narrowed it down to a block of code in solve.py doing a lot of multi-threaded FFT (i.e. with fft(..., threads=6) of pyFFTW), as well as numpy exp() and other functions and pure Python heavy list manipulation (yes, lists, not numpy arrays). All of this together (or some one part of it, yet to be discovered) is behaving as if there was some global lock taken behind the scene (i.e. inside Python interpreter), so that when multiple instances of the script (which I loosely called "threads" in previous posts, but here correct myself as the word "threads" is used more appropriately in the context of FFT in this message) are executed in parallel, they slow each other down in 3.10, but not so in 3.8. So this is definitely a very interesting 3.10 degradation problem. I will try to investigate some more tomorrow... ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BTXTX7VBXZTJBIJIX2KMAAOOQDE52R5K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.10 vs 3.8 performance degradation
I have got it narrowed down to the "threads=6" argument of fft() and ifft() functions of pyFFTW! Namely, if I do NOT pass "threads=6" to fft()/iff(), then the parallel execution of multiple instances of the scripts is the same in Python 3.8 and 3.10. But it is a bit slower than with "threads=6", of course (as my "multiprocessing" on the shell script level is tied to the multiple physical problems being solved simultaneously and this number is small -- say 4, but I have 12 processors (6 physical cores) which could execute code in parallel). So, this is where we are right now: the version pyFFTW 0.12.0 on Python 3.8 with threads=6 is 2.4 times faster than the same version 0.12.0 pyFFTW on Python 3.10, when four scripts are executed in parallel. But removing "threads=6" makes 3.10 much faster, and 3.8 a bit slower. Though not too slow -- instead of 9 vs 23 seconds I get 11.2 (Python 3.8) vs 10.8 (Python 3.10) seconds, so Python 3.10 is even a little bit faster than 3.8, but still not as fast as with threads=6 on 3.8. However, that pendulum PyQT GUI application does NOT do any Fourier transforms! So, the problem with FPS in pendulum plotting is something different. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LRQIELQV5R5LDDCRRL2VDTS7DKY7OLPT/ Code of Conduct: http://python.org/psf/codeofconduct/