Hi Juzhe,
> csrra4,vlenb
> csrra5,vlenb
Totally unrelated to this patch but this looks odd. I don't
remember if we had a patch for this already at some point.
In general the idea for the patch is to use the largest vector
element mode for the indices and compress several of
When evaluating dynamic LMUL, notice we can do better on VLA SLP with duplicate
VLA shuffle indice.
Consider this following case:
void
foo (uint16_t *restrict a, uint16_t *restrict b, int n)
{
for (int i = 0; i < n; ++i)
{
a[i * 8] = b[i * 8 + 3] + 1;
a[i * 8 + 1] = b[i * 8 + 6