On 9/3/20 3:39 AM, Hongtao Liu via Gcc-patches wrote: > Hi: > Add define_peephole2 to perform optimization like bellow: > > +/* Optimize for TARGET_AVX512F > + vpsubusw op1, op2, dst1; > + vxorps xmm, xmm, dst2; ----> vpcmpleuw op1, op2, dst3 > + vpcmpeqw dst1, dst2, dst3 */ > > and > > +/* Optimize for target above TARGET_SSE4_1 > + vpsubusw op1, op2, dst1; vpminuw op1, op2, dst1 > + vpxor xmm, xmm, dst2; ----> vpcmpeqw op1, dst1, dst3 > + vpcmpeqw dst1, dst2, dst3 */ > > Bootstrap is ok, regression test is ok for i386/x86-64 backend. > Ok for trunk? > > gcc/ChangeLog: > PR target/96906 > * config/i386/sse.md (VI12_128_256): New mode iterator. > (define_peephole2): Optimize comparison between result of > us_minus and 0, it could be optimized to "vpcmplequ" for > AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx2-pr96906-1.c: New test. > * gcc.target/i386/avx512f-pr96906-1.c: New test. > * gcc.target/i386/sse2-pr96906.c: New test. > * gcc.target/i386/sse4_1-pr96906-1.c: New test.
I'd look to see if a combiner pattern could help with these too rather than using a peep2. jeff