[Numpy-discussion] Re: Improved 2DFFT Approach

2024-03-12 Thread Alexander Levin via NumPy-Discussion
thanks for your extensive feedback. if i got you right, we can't state the 
outperformance in all cases, because it is measured by an insufficiently 
precise function and a relatively short period of time.

I understand your point of view and thank you for your observation. we will 
start working on fixing it and re-testing. 

I don't fully see your point about O3 and --fastmath, if I correctly understand 
these concepts of aggressive optimization sacrifice performance and 
computational accuracy in the long run. at the moment our operation has been 
tested by another project dealing with signal processing and the testing was 
successful and indeed showed better performance. as for computational accuracy, 
I don't fully grasp your point here either, since before publishing this we ran 
a number of tests on different dimensions and sizes, some of the tests can be 
found as well. 

nevertheless, i would agree that our development is the result of the personal 
enthusiasm of two individuals, based on the fact that the mathematical aspect 
of the algorithm currently used in the field is far from ideal. We have spent 
some of our time understanding the problem, analyzing articles on this matter, 
and implementing functions to achieve a more efficient mathematical algorithm. 

We have spent some of our time bringing the aforementioned package to a state 
where it can be further worked on and improved, and we have a great deal of 
respect for the motivations of more experienced colleagues working on 
open-source development to help us in this endeavor. As I mentioned earlier, we 
are ready to work within our abilities and knowledge, applying all the 
information available to us for research. I believe that with the combination 
of our efforts and the guidance of colleagues from NumPy, our operation can be 
tested and integrated into the package itself. We are ready to put in all 
necessary efforts from our side.

i thank you for your comments, we will replace our module with the one you 
mentioned, re-run the tests and will be happy to share the results. i 
appreciate that such a knowledgeable person in the industry took the time to 
pay attention to our work.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Improved 2DFFT Approach

2024-03-14 Thread Alexander Levin via NumPy-Discussion
Good day, Ralf.

I am sharing the results of the latest updates on our code. We have taken into 
account the comments below and are testing the timing with %timeit -o inside 
jupyter, having information about the best of 7 code passes and the average 
deviation. Writing to summarise the intermediate results.

The testing notebooks:
Memory Usage - 
https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/memory_usage.ipynb
 
Timing comparisons(updated) - 
https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/comparisons.ipynb
Our version loses to Scipy always if multithreading is enabled, also we 
wondered about type conversions - whether to leave them for test metrics or 
not. The point is that they are necessary for converting matrix values from int 
to complex128 (we will replace them with 64 if necessary) and back when 
outputting. For more convenient user-experience we preferred to leave the 
conversions for testing, we will be interested in your opinion. 

Regarding the results we have after all updates - everything is stable in 
memory, our operation wins by 2 times. Regarding execution time and efficiency 
- I have the following opinion. On tests with multithreading enabled we are 
consistently losing, while on tests with multithreading disabled we are 
consistently winning. From this we should draw one logical conclusion - our 
algorithm is mathematically smarter, which makes it possible for it to win 
steadily within the limits of memory usage and performance when multithreading 
is switched off. At the same time, multithreading itself, used by Scipy 
authors, is better and more efficient than ours - that's why our operation 
loses algorithmically at the moment when it is switched on.

>From this I can conclude that our algorithm is still more performant, but it 
>obviously needs modification of the existing multithreading system. In this 
>situation we need your advice. In theory, we can figure out and write a more 
>efficient and smarter algorithm for multithreading than our current one. In 
>practice, I'm sure the best way forward would be to collaborate with someone 
>responsible for FFT from Scipy or NumPy so that we can test our algorithm with 
>their multithreading, I'm sure this action will give the best possible 
>performance at the moment in general. I propose this option instead of our 
>separate multithreading writing, as the goal of our work is to embed in NumPy 
>so that as many people as possible can use a more efficient algorithm for 
>their work. And if we write our multithreading first, we will then have to 
>switch to the NumPy version to synthesise the package anyway. So I'm asking 
>for your feedback on our memory usage and operation efficiency results to 
>decide toget
 her the next steps of our hopefully collaborative work, if you're interested 
in doing so. Thank you for your time.

Regards, 

Alexander
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Improved 2DFFT Approach

2024-03-14 Thread Alexander Levin via NumPy-Discussion
Hi Stefan, 

indeed you're right, the underlying formula initially was created by V. 
Tutatchikov for power-of-two matrices. The initial butterfly approach requires 
a recursive breakdown to 2x2 matrix in order to proceed with precalculations of 
roots of unity (exactly what provides you the aforementioned performance 
advantage). 

I did my own research on implementations where the size is not a power of two 
and they still implement the butterfly, usually they proceeded with 0 
imputation to build to closest power of 2 (which is a mess on a bigger sizes) 
or tried dropped the columns and building them back which is also not a 
brilliant solution. At some point I found a patent number US 8.484.274B2, where 
they discussed the possible padding options for such matrices. The methodology 
was picked depending on the actual size of signal/data, therefore, in case you 
proceed with implementing butterfly with such approach, you'd need to write 
several cases and check the size. Theoretically, it's possible, though I can't 
really say if this particular case would give much more performance advantage 
rather than the same or the worse. 

Yep, indeed the intend is to generate a random matrix, so our tests can be 
objective as they can be. I appreciate your review and will also run the memory 
against rfft (i hope tomorrow). 

So for now I'm sure this method shows better performance on size of 
power-of-two square matrices as well as rectangular matrices of size  2^n x 2^m 
(this one was tested during development process). Speaking of other sizes, I 
mentioned above that I have some thoughts, but it's very case sensitive in 
terms of specific type of signal we are working with.  I would like to 
emphasize that regardless of my genuine desire to create the best possible 
version for the user, my work is not limited by my desire but constrained by 
capabilities. As you may have noticed earlier, the multithreading implemented 
by the authors of Scipy surpasses the version created by us. Considering the 
matrix dimension sensitivity of the butterfly method, I am ready to share my 
thoughts regarding specific data sizes and use cases. However, I cannot readily 
provide an optimal solution for all cases, otherwise, I would have implemented 
it first and then presented it to you. At the moment, I believe this is the 
only sig
 nificant vulnerability in the mathematics we have presented. I can't provide 
much feedback on Bluestein / Chirp-Z at this very moment, but I'll research on 
this matter and if in some case it solves our issue - will definitely implement 
it.

At this very point, I believe if our method after thorough provided corrections 
still has some performance advantage (even on matrices of more simple sizes 
like power-of-two) it is worth to embed it even using in 'if-cases' in my 
opinion. Yet i'm only a contributor and that's why i'm discussing this matter 
with you. 

I want to admit I appreciate your feedback and your time and will be waiting 
for your response.

Regards,

Alexander
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Improved 2DFFT Approach

2024-03-15 Thread Alexander Levin via NumPy-Discussion
Hi Stéfan,

upd: 

indeed, rfft2 has equal memory usage with our fft2d in terms of reals. thanks, 
Stefan. 

to this moment, i believe the results are following:

> scipy time outperformance on rectangular signals with sides of power-of-two. 
> equal memory usage with rfft2

in my eyes, it's worth trying putting our algorithm and scipy multithreading 
together, considering previous results, I believe it'll show major performance 
improvements. in case it does, i still think it's worthy trying putting the 
Cooley-Tukey operation in work in terms of cases of the mentioned signals. like 
i suggest we try testing our code as a part of numpy/scipy, tbh, i really lost 
the track of whether this thread is about numpy or scipy embedding. 

i believe if we could place the butterfly algorithm into scipy and add a 
'checking if' for the size of the matrix, we would rather win some performance 
than lose, i think any advantage in performance of the algorithm is important, 
considering the balance of memory usage and time is still observed. i suppose 
in terms of algorithm performance one step for a man is a leap for mankind in 
terms of other projects.

please lmk if you share this opinion and we should try testing. 

regards

Alexander
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com