Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

Christophe Lyon via Gcc-patches Thu, 27 Jan 2022 10:11:16 -0800

On Thu, Jan 27, 2022 at 5:29 PM Kyrylo Tkachov via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:


> Hi Christophe,
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+kyrylo.tkachov=arm....@gcc.gnu.org> On Behalf Of Christophe
> > Lyon via Gcc-patches
> > Sent: Thursday, January 13, 2022 2:56 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of
> > booleans
> >
> > This patch implements support for vectors of booleans to support MVE
> > predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
> > uint16_t) to represent predicates in intrinsics prototypes, we
> > introduce a new "predicate" type qualifier so that we can map relevant
> > builtins HImode arguments and return value to the appropriate vector
> > of booleans (VxBI).
> >
> > We have to update test_vector_ops_duplicate, because it iterates using
> > an offset in bytes, where we would need to iterate in bits: we stop
> > iterating when we reach the end of the vector of booleans.
> >
> > In addition, we have to fix the underlying definition of vectors of
> > booleans because ARM/MVE needs a different representation than
> > AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
> > element size, so that a true element of V4BI is represented by
> > '0b1111'.  This patch updates the aarch64 definition of VNx*BI as
> > needed.
> >
> > 2022-01-13  Christophe Lyon  <christophe.l...@foss.st.com>
> >       Richard Sandiford  <richard.sandif...@arm.com>
> >
> >       gcc/
> >       PR target/100757
> >       PR target/101325
> >       * config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
> >       VNx2BI): Update definition.
> >       * config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
> >       simd types.
> >       (arm_init_builtin): Map predicate vectors arguments to HImode.
> >       (arm_expand_builtin_args): Move HImode predicate arguments to
> > VxBI
> >       rtx. Move return value to HImode rtx.
> >       * config/arm/arm-builtins.h (arm_type_qualifiers): Add
> > qualifier_predicate.
> >       * config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New
> > modes.
> >       * config/arm/arm-simd-builtin-types.def (Pred1x16_t,
> >       Pred2x8_t,Pred4x4_t): New.
> >       * emit-rtl.c (init_emit_once): Handle all boolean modes.
> >       * genmodes.c (mode_data): Add boolean field.
> >       (blank_mode): Initialize it.
> >       (make_complex_modes): Fix handling of boolean modes.
> >       (make_vector_modes): Likewise.
> >       (VECTOR_BOOL_MODE): Use new COMPONENT parameter.
> >       (make_vector_bool_mode): Likewise.
> >       (BOOL_MODE): New.
> >       (make_bool_mode): New.
> >       (emit_insn_modes_h): Fix generation of boolean modes.
> >       (emit_class_narrowest_mode): Likewise.
> >       * machmode.def: Use new BOOL_MODE instead of
> > FRACTIONAL_INT_MODE
> >       to define BImode.
> >       * rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
> >       Fix handling of constm1_rtx for VECTOR_BOOL.
> >       * simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
> >       (native_decode_vector_rtx): Likewise.
> >       (test_vector_ops_duplicate): Skip vec_merge test
> >       with vectors of booleans.
> >       * varasm.c (output_constant_pool_2): Likewise.
>
> The arm parts look ok. I guess Richard is best placed to approve the
> midend parts, but I see he's on the ChangeLog so maybe he needs others to
> review them. But then again Richard is maintainer of the gen* machinery
> that's the most complicated part of the patch so he can self-approve 😊
>

Thanks Kyrill,

Regarding the ARM part, Andre had a concern, I don't know if my proposal is
OK for him?

Christophe


> Thanks,
> Kyrill
>
> >
> > diff --git a/gcc/config/aarch64/aarch64-modes.def
> > b/gcc/config/aarch64/aarch64-modes.def
> > index 976bf9b42be..8f399225a80 100644
> > --- a/gcc/config/aarch64/aarch64-modes.def
> > +++ b/gcc/config/aarch64/aarch64-modes.def
> > @@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
> >
> >  /* Vector modes.  */
> >
> > -VECTOR_BOOL_MODE (VNx16BI, 16, 2);
> > -VECTOR_BOOL_MODE (VNx8BI, 8, 2);
> > -VECTOR_BOOL_MODE (VNx4BI, 4, 2);
> > -VECTOR_BOOL_MODE (VNx2BI, 2, 2);
> > +VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
> > +VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
> > +VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
> > +VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
> >
> >  ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
> >  ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
> > diff --git a/gcc/config/arm/arm-builtins.c
> b/gcc/config/arm/arm-builtins.c
> > index 9c645722230..2ccfa37c302 100644
> > --- a/gcc/config/arm/arm-builtins.c
> > +++ b/gcc/config/arm/arm-builtins.c
> > @@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
> >    arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
> >    arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
> >
> > +  if (TARGET_HAVE_MVE)
> > +    {
> > +      arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
> > +      arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
> > +      arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
> > +    }
> > +
> >    for (i = 0; i < nelts; i++)
> >      {
> >        tree eltype = arm_simd_types[i].eltype;
> > @@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode,
> > arm_builtin_datum *d,
> >        if (qualifiers & qualifier_map_mode)
> >       op_mode = d->mode;
> >
> > +      /* MVE Predicates use HImode as mandated by the ABI: pred16_t is
> > unsigned
> > +      short.  */
> > +      if (qualifiers & qualifier_predicate)
> > +     op_mode = HImode;
> > +
> >        /* For pointers, we want a pointer to the basic type
> >        of the vector.  */
> >        if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
> > @@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target,
> > machine_mode map_mode, int fcode,
> >           case ARG_BUILTIN_COPY_TO_REG:
> >             if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
> >               op[argc] = convert_memory_address (Pmode, op[argc]);
> > +
> > +           /* MVE uses mve_pred16_t (aka HImode) for vectors of
> > predicates.  */
> > +           if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
> > +             op[argc] = gen_lowpart (mode[argc], op[argc]);
> > +
> >             /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
> >             if (!(*insn_data[icode].operand[opno].predicate)
> >                 (op[argc], mode[argc]))
> > @@ -3144,6 +3161,13 @@ constant_arg:
> >    else
> >      emit_insn (insn);
> >
> > +  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
> > +    {
> > +      rtx HItarget = gen_reg_rtx (HImode);
> > +      emit_move_insn (HItarget, gen_lowpart (HImode, target));
> > +      return HItarget;
> > +    }
> > +
> >    return target;
> >  }
> >
> > diff --git a/gcc/config/arm/arm-builtins.h
> b/gcc/config/arm/arm-builtins.h
> > index e5130d6d286..a8ef8aef82d 100644
> > --- a/gcc/config/arm/arm-builtins.h
> > +++ b/gcc/config/arm/arm-builtins.h
> > @@ -84,7 +84,9 @@ enum arm_type_qualifiers
> >    qualifier_lane_pair_index = 0x1000,
> >    /* Lane indices selected in quadtuplets - must be within range of
> previous
> >       argument = a vector.  */
> > -  qualifier_lane_quadtup_index = 0x2000
> > +  qualifier_lane_quadtup_index = 0x2000,
> > +  /* MVE vector predicates.  */
> > +  qualifier_predicate = 0x4000
> >  };
> >
> >  struct arm_simd_type_info
> > diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
> > index de689c8b45e..9ed0cd042c5 100644
> > --- a/gcc/config/arm/arm-modes.def
> > +++ b/gcc/config/arm/arm-modes.def
> > @@ -84,6 +84,14 @@ VECTOR_MODE (FLOAT, BF, 2);   /*
>  V2BF.  */
> >  VECTOR_MODE (FLOAT, BF, 4);   /*              V4BF.  */
> >  VECTOR_MODE (FLOAT, BF, 8);   /*              V8BF.  */
> >
> > +/* Predicates for MVE.  */
> > +BOOL_MODE (B2I, 2, 1);
> > +BOOL_MODE (B4I, 4, 1);
> > +
> > +VECTOR_BOOL_MODE (V16BI, 16, BI, 2);
> > +VECTOR_BOOL_MODE (V8BI, 8, B2I, 2);
> > +VECTOR_BOOL_MODE (V4BI, 4, B4I, 2);
> > +
> >  /* Fraction and accumulator vector modes.  */
> >  VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
> >  VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
> > diff --git a/gcc/config/arm/arm-simd-builtin-types.def
> > b/gcc/config/arm/arm-simd-builtin-types.def
> > index 6ba6f211531..920c2a68e4c 100644
> > --- a/gcc/config/arm/arm-simd-builtin-types.def
> > +++ b/gcc/config/arm/arm-simd-builtin-types.def
> > @@ -51,3 +51,7 @@
> >    ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
> >    ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
> >    ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
> > +
> > +  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
> > +  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
> > +  ENTRY (Pred4x4_t, V4BI, unsigned, 4, uint16, 21)
> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index feeee16d320..5f559f8fd93 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -6239,9 +6239,14 @@ init_emit_once (void)
> >
> >    /* For BImode, 1 and -1 are unsigned and signed interpretations
> >       of the same value.  */
> > -  const_tiny_rtx[0][(int) BImode] = const0_rtx;
> > -  const_tiny_rtx[1][(int) BImode] = const_true_rtx;
> > -  const_tiny_rtx[3][(int) BImode] = const_true_rtx;
> > +  for (mode = MIN_MODE_BOOL;
> > +       mode <= MAX_MODE_BOOL;
> > +       mode = (machine_mode)((int)(mode) + 1))
> > +    {
> > +      const_tiny_rtx[0][(int) mode] = const0_rtx;
> > +      const_tiny_rtx[1][(int) mode] = const_true_rtx;
> > +      const_tiny_rtx[3][(int) mode] = const_true_rtx;
> > +    }
> >
> >    for (mode = MIN_MODE_PARTIAL_INT;
> >         mode <= MAX_MODE_PARTIAL_INT;
> > @@ -6260,13 +6265,16 @@ init_emit_once (void)
> >        const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner,
> inner);
> >      }
> >
> > -  /* As for BImode, "all 1" and "all -1" are unsigned and signed
> > -     interpretations of the same value.  */
> >    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
> >      {
> >        const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
> >        const_tiny_rtx[3][(int) mode] = gen_const_vector (mode, 3);
> > -      const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> > +      if (GET_MODE_INNER (mode) == BImode)
> > +     /* As for BImode, "all 1" and "all -1" are unsigned and signed
> > +        interpretations of the same value.  */
> > +     const_tiny_rtx[1][(int) mode] = const_tiny_rtx[3][(int) mode];
> > +      else
> > +     const_tiny_rtx[1][(int) mode] = gen_const_vector (mode, 1);
> >      }
> >
> >    FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
> > diff --git a/gcc/genmodes.c b/gcc/genmodes.c
> > index 6001b854547..0bb1a7c0b48 100644
> > --- a/gcc/genmodes.c
> > +++ b/gcc/genmodes.c
> > @@ -78,6 +78,7 @@ struct mode_data
> >    bool need_bytesize_adj;    /* true if this mode needs dynamic size
> >                                  adjustment */
> >    unsigned int int_n;                /* If nonzero, then __int<INT_N>
> will be
> > defined */
> > +  bool boolean;
> >  };
> >
> >  static struct mode_data *modes[MAX_MODE_CLASS];
> > @@ -88,7 +89,8 @@ static const struct mode_data blank_mode = {
> >    0, "<unknown>", MAX_MODE_CLASS,
> >    0, -1U, -1U, -1U, -1U,
> >    0, 0, 0, 0, 0, 0,
> > -  "<unknown>", 0, 0, 0, 0, false, false, 0
> > +  "<unknown>", 0, 0, 0, 0, false, false, 0,
> > +  false
> >  };
> >
> >  static htab_t modes_by_name;
> > @@ -456,7 +458,7 @@ make_complex_modes (enum mode_class cl,
> >        size_t m_len;
> >
> >        /* Skip BImode.  FIXME: BImode probably shouldn't be MODE_INT.  */
> > -      if (m->precision == 1)
> > +      if (m->boolean)
> >       continue;
> >
> >        m_len = strlen (m->name);
> > @@ -528,7 +530,7 @@ make_vector_modes (enum mode_class cl, const
> > char *prefix, unsigned int width,
> >        not be necessary.  */
> >        if (cl == MODE_FLOAT && m->bytesize == 1)
> >       continue;
> > -      if (cl == MODE_INT && m->precision == 1)
> > +      if (m->boolean)
> >       continue;
> >
> >        if ((size_t) snprintf (buf, sizeof buf, "%s%u%s", prefix,
> > @@ -548,17 +550,18 @@ make_vector_modes (enum mode_class cl, const
> > char *prefix, unsigned int width,
> >
> >  /* Create a vector of booleans called NAME with COUNT elements and
> >     BYTESIZE bytes in total.  */
> > -#define VECTOR_BOOL_MODE(NAME, COUNT, BYTESIZE) \
> > -  make_vector_bool_mode (#NAME, COUNT, BYTESIZE, __FILE__, __LINE__)
> > +#define VECTOR_BOOL_MODE(NAME, COUNT, COMPONENT, BYTESIZE)
> >               \
> > +  make_vector_bool_mode (#NAME, COUNT, #COMPONENT, BYTESIZE,
> >               \
> > +                      __FILE__, __LINE__)
> >  static void ATTRIBUTE_UNUSED
> >  make_vector_bool_mode (const char *name, unsigned int count,
> > -                    unsigned int bytesize, const char *file,
> > -                    unsigned int line)
> > +                    const char *component, unsigned int bytesize,
> > +                    const char *file, unsigned int line)
> >  {
> > -  struct mode_data *m = find_mode ("BI");
> > +  struct mode_data *m = find_mode (component);
> >    if (!m)
> >      {
> > -      error ("%s:%d: no mode \"BI\"", file, line);
> > +      error ("%s:%d: no mode \"%s\"", file, line, component);
> >        return;
> >      }
> >
> > @@ -596,6 +599,20 @@ make_int_mode (const char *name,
> >    m->precision = precision;
> >  }
> >
> > +#define BOOL_MODE(N, B, Y) \
> > +  make_bool_mode (#N, B, Y, __FILE__, __LINE__)
> > +
> > +static void
> > +make_bool_mode (const char *name,
> > +             unsigned int precision, unsigned int bytesize,
> > +             const char *file, unsigned int line)
> > +{
> > +  struct mode_data *m = new_mode (MODE_INT, name, file, line);
> > +  m->bytesize = bytesize;
> > +  m->precision = precision;
> > +  m->boolean = true;
> > +}
> > +
> >  #define OPAQUE_MODE(N, B)                    \
> >    make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
> >
> > @@ -1298,9 +1315,21 @@ enum machine_mode\n{");
> >        /* Don't use BImode for MIN_MODE_INT, since otherwise the middle
> >        end will try to use it for bitfields in structures and the
> >        like, which we do not want.  Only the target md file should
> > -      generate BImode widgets.  */
> > -      if (first && first->precision == 1 && c == MODE_INT)
> > -     first = first->next;
> > +      generate BImode widgets.  Since some targets such as ARM/MVE
> > +      define boolean modes with multiple bits, handle those too.  */
> > +      if (first && first->boolean)
> > +     {
> > +       struct mode_data *last_bool = first;
> > +       printf ("  MIN_MODE_BOOL = E_%smode,\n", first->name);
> > +
> > +       while (first && first->boolean)
> > +         {
> > +           last_bool = first;
> > +           first = first->next;
> > +         }
> > +
> > +       printf ("  MAX_MODE_BOOL = E_%smode,\n\n", last_bool->name);
> > +     }
> >
> >        if (first && last)
> >       printf ("  MIN_%s = E_%smode,\n  MAX_%s = E_%smode,\n\n",
> > @@ -1679,15 +1708,25 @@ emit_class_narrowest_mode (void)
> >    print_decl ("unsigned char", "class_narrowest_mode",
> > "MAX_MODE_CLASS");
> >
> >    for (c = 0; c < MAX_MODE_CLASS; c++)
> > -    /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > -    tagged_printf ("MIN_%s", mode_class_names[c],
> > -                modes[c]
> > -                ? ((c != MODE_INT || modes[c]->precision != 1)
> > -                   ? modes[c]->name
> > -                   : (modes[c]->next
> > -                      ? modes[c]->next->name
> > -                      : void_mode->name))
> > -                : void_mode->name);
> > +    {
> > +      /* Bleah, all this to get the comment right for MIN_MODE_INT.  */
> > +      const char *comment_name = void_mode->name;
> > +
> > +      if (modes[c])
> > +     if (c != MODE_INT || !modes[c]->boolean)
> > +       comment_name = modes[c]->name;
> > +     else
> > +       {
> > +         struct mode_data *m = modes[c];
> > +         while (m->boolean)
> > +           m = m->next;
> > +         if (m)
> > +           comment_name = m->name;
> > +         else
> > +           comment_name = void_mode->name;
> > +       }
> > +      tagged_printf ("MIN_%s", mode_class_names[c], comment_name);
> > +    }
> >
> >    print_closer ();
> >  }
> > diff --git a/gcc/machmode.def b/gcc/machmode.def
> > index 866a2082d01..eb7905ea23d 100644
> > --- a/gcc/machmode.def
> > +++ b/gcc/machmode.def
> > @@ -196,7 +196,7 @@ RANDOM_MODE (VOID);
> >  RANDOM_MODE (BLK);
> >
> >  /* Single bit mode used for booleans.  */
> > -FRACTIONAL_INT_MODE (BI, 1, 1);
> > +BOOL_MODE (BI, 1, 1);
> >
> >  /* Basic integer modes.  We go up to TI in generic code (128 bits).
> >     TImode is needed here because the some front ends now genericly
> > diff --git a/gcc/rtx-vector-builder.c b/gcc/rtx-vector-builder.c
> > index e36aba010a0..55ffe0d5a76 100644
> > --- a/gcc/rtx-vector-builder.c
> > +++ b/gcc/rtx-vector-builder.c
> > @@ -90,8 +90,10 @@ rtx_vector_builder::find_cached_value ()
> >
> >    if (GET_MODE_CLASS (m_mode) == MODE_VECTOR_BOOL)
> >      {
> > -      if (elt == const1_rtx || elt == constm1_rtx)
> > +      if (elt == const1_rtx)
> >       return CONST1_RTX (m_mode);
> > +      else if (elt == constm1_rtx)
> > +     return CONSTM1_RTX (m_mode);
> >        else if (elt == const0_rtx)
> >       return CONST0_RTX (m_mode);
> >        else
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index c36c825f958..532537ea48d 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -6876,12 +6876,13 @@ native_encode_rtx (machine_mode mode, rtx x,
> > vec<target_unit> &bytes,
> >         /* This is the only case in which elements can be smaller than
> >            a byte.  */
> >         gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
> > +       auto mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> >         for (unsigned int i = 0; i < num_bytes; ++i)
> >           {
> >             target_unit value = 0;
> >             for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
> >               {
> > -               value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & 1) << j;
> > +               value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) <<
> j;
> >                 elt += 1;
> >               }
> >             bytes.quick_push (value);
> > @@ -7025,9 +7026,8 @@ native_decode_vector_rtx (machine_mode mode,
> > const vec<target_unit> &bytes,
> >         unsigned int bit_index = first_byte * BITS_PER_UNIT + i *
> elt_bits;
> >         unsigned int byte_index = bit_index / BITS_PER_UNIT;
> >         unsigned int lsb = bit_index % BITS_PER_UNIT;
> > -       builder.quick_push (bytes[byte_index] & (1 << lsb)
> > -                           ? CONST1_RTX (BImode)
> > -                           : CONST0_RTX (BImode));
> > +       unsigned int value = bytes[byte_index] >> lsb;
> > +       builder.quick_push (gen_int_mode (value, GET_MODE_INNER
> > (mode)));
> >       }
> >      }
> >    else
> > @@ -7994,17 +7994,23 @@ test_vector_ops_duplicate (machine_mode
> > mode, rtx scalar_reg)
> >                                                   duplicate, last_par));
> >
> >        /* Test a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE.  */
> > -      rtx vector_reg = make_test_reg (mode);
> > -      for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> > +      /* Skip this test for vectors of booleans, because offset is in
> bytes,
> > +      while vec_merge indices are in elements (usually bits).  */
> > +      if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
> >       {
> > -       if (i >= HOST_BITS_PER_WIDE_INT)
> > -         break;
> > -       rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> > -       rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg, mask);
> > -       poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> > -       ASSERT_RTX_EQ (scalar_reg,
> > -                      simplify_gen_subreg (inner_mode, vm,
> > -                                           mode, offset));
> > +       rtx vector_reg = make_test_reg (mode);
> > +       for (unsigned HOST_WIDE_INT i = 0; i < const_nunits; i++)
> > +         {
> > +           if (i >= HOST_BITS_PER_WIDE_INT)
> > +             break;
> > +           rtx mask = GEN_INT ((HOST_WIDE_INT_1U << i) | (i + 1));
> > +           rtx vm = gen_rtx_VEC_MERGE (mode, duplicate, vector_reg,
> > mask);
> > +           poly_uint64 offset = i * GET_MODE_SIZE (inner_mode);
> > +
> > +           ASSERT_RTX_EQ (scalar_reg,
> > +                          simplify_gen_subreg (inner_mode, vm,
> > +                                               mode, offset));
> > +         }
> >       }
> >      }
> >
> > diff --git a/gcc/varasm.c b/gcc/varasm.c
> > index 76574be191f..5f59b6ace15 100644
> > --- a/gcc/varasm.c
> > +++ b/gcc/varasm.c
> > @@ -4085,6 +4085,7 @@ output_constant_pool_2 (fixed_size_mode mode,
> > rtx x, unsigned int align)
> >       unsigned int elt_bits = GET_MODE_BITSIZE (mode) / nelts;
> >       unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> >       scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require
> > ();
> > +     unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> >
> >       /* Build the constant up one integer at a time.  */
> >       unsigned int elts_per_int = int_bits / elt_bits;
> > @@ -4093,8 +4094,10 @@ output_constant_pool_2 (fixed_size_mode
> > mode, rtx x, unsigned int align)
> >           unsigned HOST_WIDE_INT value = 0;
> >           unsigned int limit = MIN (nelts - i, elts_per_int);
> >           for (unsigned int j = 0; j < limit; ++j)
> > -           if (INTVAL (CONST_VECTOR_ELT (x, i + j)) != 0)
> > -             value |= 1 << (j * elt_bits);
> > +         {
> > +           auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
> > +           value |= (elt & mask) << (j * elt_bits);
> > +         }
> >           output_constant_pool_2 (int_mode, gen_int_mode (value,
> > int_mode),
> >                                   i != 0 ? MIN (align, int_bits) :
> align);
> >         }
> > --
> > 2.25.1
>
>

Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

Reply via email to