https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2016-01-04 00:00:00 |2021-8-24
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do slightly better but not close:
movdqa (%rax), %xmm0
addq $16, %rax
psubusw %xmm1, %xmm0
paddw %xmm1, %xmm0
paddw %xmm2, %xmm0
movaps %xmm0, -16(%rax)
Which is expanded from:
vect__1.6_15 = MAX_EXPR <vect_m_6.5_3, { 64, 64, 64, 64, 64, 64, 64, 64 }>;
vect__2.7_17 = vect__1.6_15 + { 65472, 65472, 65472, 65472, 65472, 65472,
65472, 65472 };
-mavx2 we get:
vpmaxuw (%rax), %ymm2, %ymm0
addq $32, %rax
vpaddw %ymm1, %ymm0, %ymm0
vmovdqa %ymm0, -32(%rax)
Just note 65472 is -64.
This shouldn't be too hard to detect and add and even lower back to
MAX_EXPR/PLUS_EXPR if us_minus does not exist.