I have got it narrowed down to the "threads=6" argument of fft() and ifft()
functions of pyFFTW! Namely, if I do NOT pass "threads=6" to fft()/iff(), then
the parallel execution of multiple instances of the scripts is the same in
Python 3.8 and 3.10. But it is a bit slower than with "threads=6",
So far I have narrowed it down to a block of code in solve.py doing a lot of
multi-threaded FFT (i.e. with fft(..., threads=6) of pyFFTW), as well as numpy
exp() and other functions and pure Python heavy list manipulation (yes, lists,
not numpy arrays). All of this together (or some one part of
I have created four different sets of initial data, one for each thread of
execution and no, unfortunately, that does NOT solve the problem. Still, when
four threads are executed in parallel, 3.8 outperforms 3.10 by a factor of 2.4.
So, there is some other point of contention between the threads
I think I have found something very interesting. Namely, I removed all
multiprocessing (which is done in the shell script, not in Python) and so
reduced the program to just a single thread of execution. And lo and behold,
Python 3.10 now consistently beats 3.8 by about 5%. However, this is not t
In both cases I installed numpy using "sudo -H pip install numpy". And just now
I upgraded numpy in 3.8 using "sudo -H pip3.8 install --upgrade numpy".
I will try to simplify the program by removing all the higher level complexity
and see what I find.
Alas, it is exactly the same as previously reported, so the problem persists.
If it was exactly the same between Python versions I would celebrate and shout
for joy, seeing that the problem is narrowed down to numpy.
I can carefully upgrade all the other packages in 3.8 to match those in 3.10.
To eliminate the possibility of being affected by the different versions of
numpy I have just now upgraded numpy in Python 3.8 environment to the latest
version, so both 3.8 and 3.10 and using numpy 1.21.4 and still the timing is
exactly the same.
___