https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138

manolis.tsamis at vrull dot eu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manolis.tsamis at vrull dot eu

--- Comment #11 from manolis.tsamis at vrull dot eu ---
> The full satd_8x4 looks like the following, the 2nd loop isn't to be
> disregarded

Regarding the second loop, this patch
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608827.html should
result in improved vectorization and performance.

Currently ((a>>15)&0x10001)*0xffff from abs2 is calculated using 4 vector
operations (shift, bitand, shift+sub for the multiplication) whereas with the
proposed patch this will be a single vector compare operation.

Reply via email to