Hi Juzhe, > csrr a4,vlenb > csrr a5,vlenb
Totally unrelated to this patch but this looks odd. I don't remember if we had a patch for this already at some point. In general the idea for the patch is to use the largest vector element mode for the indices and compress several of those in one vector element. We are limited to XLEN because the computation is done in scalar registers. Is that right? I would find it easier to understand what's happening with the explanation of the bigger picture upfront. It's all a bit implicit right now, so a few more comments would help. > +void > +rvv_builder::merge_pattern (rtx src) It's more like a compress? Actually both merge and compress are misleading because we have dedicated instructions with that name. > + unsigned int ele_num = GET_MODE_BITSIZE (Pmode) / this->inner_bits_size (); Rename to elems_per_scalar. > + if (!get_vector_mode (Pmode, nunits).exists (&mode)) > + return; There is no return value, what if we fail here? > + = gen_lowpart (Pmode, CONST_VECTOR_ELT (src, k + j * ele_num)); > + e = expand_simple_binop (Pmode, AND, e, imm, NULL_RTX, false, > + OPTAB_DIRECT); > + e = expand_simple_binop ( > + Pmode, ASHIFT, e, > + gen_int_mode (this->inner_bits_size () * k, Pmode), NULL_RTX, > + false, OPTAB_DIRECT); > + val = expand_simple_binop (Pmode, IOR, e, val, NULL_RTX, false, > + OPTAB_DIRECT); Did you try the same doing everything in vector registers? I.e. some reinterpretation with a larger element size. Maybe it doesn't make sense, I didn't check but just curious. > + /* We don't apply such approach for LMUL = 8 since vrgather.vv doesn't > + allow dest overlap with any source register and VLA repeating vector > + always by a addition. So, it such VLA constant vector will consume > + 32 registers if LMUL = 8 which cause serious high register pressure. */ > + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && npatterns > 2) > + { Don't we also need to check that MAX_EEW <= Pmode? Like for rv64gc_zve32x. We would probably fail to find a vector mode in merge_pattern but without a return value. I would prefer to check that as well here, maybe we can ensure that merge_pattern cannot fail even? Wouldn't hurt to add a test case for zve32x as well. Regards Robin