https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:32b0abb24b8702ec9954448739682ace6fa5ccf5

commit r11-5398-g32b0abb24b8702ec9954448739682ace6fa5ccf5
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Thu Nov 26 08:44:15 2020 +0100

    i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]

    The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
    for consistency with some other iterators, as I need VI12_AVX2 without
    AVX512BW for this change.
    The real meat is a combiner split which combine
    can use to optimize psubusw compared to 0 into pminuw compared to op0
    (and similarly for psubusb compared to 0 into pminub compared to op0).
    According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
    are the same, but the advantage of pminu[bw] is that the comparison
    doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
    -       psubusw %xmm1, %xmm0
    -       pxor    %xmm1, %xmm1
    +       pminuw  %xmm0, %xmm1
            pcmpeqw %xmm1, %xmm0
    and similarly for avx2:
    -       vpsubusb        %ymm1, %ymm0, %ymm0
    -       vpxor   %xmm1, %xmm1, %xmm1
    -       vpcmpeqb        %ymm1, %ymm0, %ymm0
    +       vpminub %ymm1, %ymm0, %ymm1
    +       vpcmpeqb        %ymm0, %ymm1, %ymm0

    I haven't done the AVX512{BW,VL} define_split(s), they'll need
    to match the UNSPEC_PCMP which are used for avx512 comparisons.

    2020-11-26  Jakub Jelinek  <ja...@redhat.com>

            PR target/96906
            * config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
            (VI12_AVX2_AVX512BW): New mode iterator.
            (<sse2_avx2>_<plusminus_insn><mode>3<mask_name>,
            uavg<mode>3_ceil, <sse2_avx2>_uavg<mode>3<mask_name>): Use
            VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
            (*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Likewise.
            (*<sse2_avx2>_uavg<mode>3<mask_name>): Likewise.
            (*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Add a new
            define_split after this insn.

            * gcc.target/i386/pr96906-1.c: New test.

Reply via email to