https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2016-01-04 00:00:00         |2021-8-24

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do slightly better but not close:
        movdqa  (%rax), %xmm0
        addq    $16, %rax
        psubusw %xmm1, %xmm0
        paddw   %xmm1, %xmm0
        paddw   %xmm2, %xmm0
        movaps  %xmm0, -16(%rax)

Which is expanded from:
  vect__1.6_15 = MAX_EXPR <vect_m_6.5_3, { 64, 64, 64, 64, 64, 64, 64, 64 }>;
  vect__2.7_17 = vect__1.6_15 + { 65472, 65472, 65472, 65472, 65472, 65472,
65472, 65472 };

-mavx2 we get:
        vpmaxuw (%rax), %ymm2, %ymm0
        addq    $32, %rax
        vpaddw  %ymm1, %ymm0, %ymm0
        vmovdqa %ymm0, -32(%rax)

Just note 65472 is -64.

This shouldn't be too hard to detect and add and even lower back to
MAX_EXPR/PLUS_EXPR if us_minus does not exist.

Reply via email to