https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2016-01-04 00:00:00 |2021-8-24 --- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- We do slightly better but not close: movdqa (%rax), %xmm0 addq $16, %rax psubusw %xmm1, %xmm0 paddw %xmm1, %xmm0 paddw %xmm2, %xmm0 movaps %xmm0, -16(%rax) Which is expanded from: vect__1.6_15 = MAX_EXPR <vect_m_6.5_3, { 64, 64, 64, 64, 64, 64, 64, 64 }>; vect__2.7_17 = vect__1.6_15 + { 65472, 65472, 65472, 65472, 65472, 65472, 65472, 65472 }; -mavx2 we get: vpmaxuw (%rax), %ymm2, %ymm0 addq $32, %rax vpaddw %ymm1, %ymm0, %ymm0 vmovdqa %ymm0, -32(%rax) Just note 65472 is -64. This shouldn't be too hard to detect and add and even lower back to MAX_EXPR/PLUS_EXPR if us_minus does not exist.