On 9/3/20 3:39 AM, Hongtao Liu via Gcc-patches wrote:
> Hi:
>   Add define_peephole2 to perform optimization like bellow:
>
> +/* Optimize for TARGET_AVX512F
> +  vpsubusw op1, op2, dst1;
> +  vxorps xmm, xmm, dst2; ---->   vpcmpleuw op1, op2, dst3
> +  vpcmpeqw dst1, dst2, dst3  */
>
> and
>
> +/* Optimize for target above TARGET_SSE4_1
> +  vpsubusw op1, op2, dst1;      vpminuw op1, op2, dst1
> +  vpxor xmm, xmm, dst2; ---->   vpcmpeqw op1, dst1, dst3
> +  vpcmpeqw dst1, dst2, dst3  */
>
> Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> Ok for trunk?
>
> gcc/ChangeLog:
>         PR target/96906
>         * config/i386/sse.md (VI12_128_256): New mode iterator.
>         (define_peephole2): Optimize comparison between result of
>         us_minus and 0, it could be optimized to "vpcmplequ" for
>         AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx2-pr96906-1.c: New test.
>         * gcc.target/i386/avx512f-pr96906-1.c: New test.
>         * gcc.target/i386/sse2-pr96906.c: New test.
>         * gcc.target/i386/sse4_1-pr96906-1.c: New test.

I'd look to see if a combiner pattern could help with these too rather
than using a peep2.

jeff

Reply via email to