Hi Juzhe,

>       csrr    a4,vlenb
>       csrr    a5,vlenb

Totally unrelated to this patch but this looks odd.  I don't
remember if we had a patch for this already at some point.

In general the idea for the patch is to use the largest vector
element mode for the indices and compress several of those
in one vector element.  We are limited to XLEN because the
computation is done in scalar registers.  Is that right?

I would find it easier to understand what's happening with the
explanation of the bigger picture upfront.  It's all a bit
implicit right now, so a few more comments would help.

> +void
> +rvv_builder::merge_pattern (rtx src)

It's more like a compress?  Actually both merge and compress are
misleading because we have dedicated instructions with that name.

> +  unsigned int ele_num = GET_MODE_BITSIZE (Pmode) / this->inner_bits_size ();
Rename to elems_per_scalar.

> +  if (!get_vector_mode (Pmode, nunits).exists (&mode))
> +    return;

There is no return value, what if we fail here?

> +             = gen_lowpart (Pmode, CONST_VECTOR_ELT (src, k + j * ele_num));
> +           e = expand_simple_binop (Pmode, AND, e, imm, NULL_RTX, false,
> +                                    OPTAB_DIRECT);
> +           e = expand_simple_binop (
> +             Pmode, ASHIFT, e,
> +             gen_int_mode (this->inner_bits_size () * k, Pmode), NULL_RTX,
> +             false, OPTAB_DIRECT);
> +           val = expand_simple_binop (Pmode, IOR, e, val, NULL_RTX, false,
> +                                      OPTAB_DIRECT);

Did you try the same doing everything in vector registers?
I.e. some reinterpretation with a larger element size.  Maybe
it doesn't make sense, I didn't check but just curious.

> +      /* We don't apply such approach for LMUL = 8 since vrgather.vv doesn't
> +      allow dest overlap with any source register and VLA repeating vector
> +      always by a addition.  So, it such VLA constant vector will consume
> +      32 registers if LMUL = 8 which cause serious high register pressure. */
> +      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && npatterns > 2)
> +     {

Don't we also need to check that MAX_EEW <= Pmode?  Like for
rv64gc_zve32x.  We would probably fail to find a vector mode
in merge_pattern but without a return value.  I would prefer
to check that as well here, maybe we can ensure that merge_pattern
cannot fail even?  Wouldn't hurt to add a test case for zve32x
as well.

Regards
 Robin

Reply via email to