On Sun, Mar 27, 2022 at 11:35 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sun, Mar 27, 2022 at 8:14 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > Since AVX512VL and AVX512BW are required for AVX512 VPSHUFB, replace the
> > "Yv" register constraint with the "Yw" register constraint.
>
> This is an obvious fix, as said in https://gcc.gnu.org/gitwrite.html :
>
> Obvious fixes can be committed without prior approval. Just check in
> the fix and copy it to gcc-patches. A good test to determine whether a
> fix is obvious: will the person who objects to my work the most be
> able to find a fault with my fix? If the fix is later found to be
> faulty, it can always be rolled back. We don't want to get overly
> restrictive about checkin policies.

I checked this into the master branch.  I am backporting it to
release branches.  I will drop the testcase for release branches
since __builtin_shufflevector is new for GCC 12.

> Thanks,
> Uros.
>
> >
> > gcc/
> >
> >         PR target/105068
> >         * config/i386/sse.md (*ssse3_pshufbv8qi3): Replace "Yv" with
> >         "Yw".
> >
> > gcc/testsuite/
> >
> >         PR target/105068
> >         * gcc.target/i386/pr105068.c: New test.
> > ---
> >  gcc/config/i386/sse.md                   |  6 +--
> >  gcc/testsuite/gcc.target/i386/pr105068.c | 47 ++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105068.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 33bd2c4768a..58d2bd972ed 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -20758,9 +20758,9 @@ (define_expand "ssse3_pshufbv8qi3"
> >  })
> >
> >  (define_insn_and_split "*ssse3_pshufbv8qi3"
> > -  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
> > -       (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
> > -                     (match_operand:V8QI 2 "register_mmxmem_operand" 
> > "ym,x,Yv")
> > +  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yw")
> > +       (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yw")
> > +                     (match_operand:V8QI 2 "register_mmxmem_operand" 
> > "ym,x,Yw")
> >                       (match_operand:V4SI 4 "reg_or_const_vector_operand"
> >                                           "i,3,3")]
> >                      UNSPEC_PSHUFB))
> > diff --git a/gcc/testsuite/gcc.target/i386/pr105068.c 
> > b/gcc/testsuite/gcc.target/i386/pr105068.c
> > new file mode 100644
> > index 00000000000..e5fb0338e3b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr105068.c
> > @@ -0,0 +1,47 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Og -march=x86-64 -mavx512vl -fsanitize=thread 
> > -fstack-protector-all" } */
> > +
> > +typedef char __attribute__((__vector_size__(8))) C;
> > +typedef int __attribute__((__vector_size__(8))) U;
> > +typedef int __attribute__((__vector_size__(16))) V;
> > +typedef int __attribute__((__vector_size__(32))) W;
> > +typedef long long __attribute__((__vector_size__(64))) L;
> > +typedef _Float64 __attribute__((__vector_size__(16))) F;
> > +typedef _Float64 __attribute__((__vector_size__(64))) G;
> > +C c;
> > +int i;
> > +
> > +U foo0( W v256u32_0,
> > +           W v256s32_0,
> > +           V v128u64_0,
> > +           V v128s64_0,
> > +           W v256u64_0,
> > +           W v256s64_0,
> > +           L v512s64_0,
> > +           W v256u128_0,
> > +           W v256s128_0,
> > +           V v128f32_0,
> > +           W v256f32_0,
> > +           F F_0,
> > +           W v256f64_0,
> > +           G G_0) {
> > +  C U_1 = __builtin_ia32_pshufb(c, c);
> > +  G_0 += __builtin_convertvector(v512s64_0, G);
> > +  F F_1 = __builtin_shufflevector(F_0, G_0, 2, 2);
> > +  W W_r = v256u32_0 + v256s32_0 + v256u64_0 + v256s64_0 + v256u128_0 +
> > +                    v256s128_0 + v256f32_0 + v256f64_0;
> > +  V V_r = ((union {
> > +                      W a;
> > +                      V b;
> > +                    })W_r)
> > +                        .b +
> > +                    i + v128u64_0 + v128s64_0 + v128f32_0 +
> > +                    (V)F_1;
> > +  U U_r = ((union {
> > +                    V a;
> > +                    U b;
> > +                  })V_r)
> > +                      .b +
> > +                  (U)U_1;
> > +  return U_r;
> > +}
> > --
> > 2.35.1
> >



-- 
H.J.

Reply via email to