liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi: > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} > Ok for trunk?
I think if anything the canonicalisation should be the other way: if the shift amount is an in-range constant, we know that it fits within a vector element, and so the vector form should be preferred. IMO it's a wart that we support (say): (ashift:V4SI (reg:V4SI R) (const_int 1)) even though: (plus:V4SI (reg:V4SI R) (const_int 1)) (minus:V4SI (const_int 1) (reg:V4SI R)) are not well-formed (at least AFAIK). Thanks, Richard > > gcc/ChangeLog: > > PR rtl-optimization/101796 > * simplify-rtx.c > (simplify_context::simplify_binary_operation_1): Simplify > vector shift/rotate with const_vec_duplicate to vector > shift/rotate with const_int element. > > gcc/testsuite/ChangeLog: > > PR rtl-optimization/101796 > * gcc.target/i386/pr101796.c: New test. > --- > gcc/simplify-rtx.c | 15 ++++++ > gcc/testsuite/gcc.target/i386/pr101796.c | 65 ++++++++++++++++++++++++ > 2 files changed, 80 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index a719f57870f..75f3e455562 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > return simplify_gen_binary (code, mode, op0, > gen_int_shift_amount (mode, val)); > } > + > + /* Optimize vector shift/rotate with const_vec_duplicate > + to vector shift/rotate with const_int element. > + /* TODO: vec_duplicate with variable can also be simplified, > + but GCC only require operand 2 of shift/rotate to be a scalar type > + which can have different modes in different backends, it makes > + simplication difficult to decide which mode should be choosed > + for shift/rotate count. */ > + if ((code == ASHIFTRT || code == LSHIFTRT > + || code == ASHIFT || code == ROTATERT > + || code == ROTATE) > + && const_vec_duplicate_p (op1)) > + return simplify_gen_binary (code, mode, op0, > + unwrap_const_vec_duplicate (op1)); > + > break; > > case ASHIFT: > diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c > b/gcc/testsuite/gcc.target/i386/pr101796.c > new file mode 100644 > index 00000000000..c22d6267fe5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101796.c > @@ -0,0 +1,65 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -O2 " } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > +/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } } */ > +/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } } */ > +/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } } */ > +/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } } */ > + > +#include <immintrin.h> > + > +__m512i > +foo (__m512i a) > +{ > + return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo1 (__m512i a) > +{ > + return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo2 (__m512i a, long long b) > +{ > + return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3)); > +} > + > +__m512i > +foo3 (__m512i a) > +{ > + return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo4 (__m512i a) > +{ > + return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo5 (__m512i a, long long b) > +{ > + return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3)); > +} > + > +__m512i > +foo6 (__m512i a) > +{ > + return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo7 (__m512i a) > +{ > + return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo8 (__m512i a, long long b) > +{ > + return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3)); > +}