https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
manolis.tsamis at vrull dot eu changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |manolis.tsamis at vrull dot eu --- Comment #11 from manolis.tsamis at vrull dot eu --- > The full satd_8x4 looks like the following, the 2nd loop isn't to be > disregarded Regarding the second loop, this patch https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608827.html should result in improved vectorization and performance. Currently ((a>>15)&0x10001)*0xffff from abs2 is calculated using 4 vector operations (shift, bitand, shift+sub for the multiplication) whereas with the proposed patch this will be a single vector compare operation.