>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE
>> (GET_MODE_INNER (vmode)));
No, I think it will make us miss some optimization.
For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us
missed merge optimization but
we definitely can do merge optimization.
>> Also add a comment that the non-constant case is handled by
>> shuffle_decompress_patterns in case we have a HImode vector twice the
>> size that can hold our indices.
Ok.
>> Comment should go inside the if branch.
Ok.
>>Also add this as comment:
>>As the mask is a simple {0, 1, ...} pattern and the length is known we can
>>store it in a scalar register and broadcast it to a mask register.
Ok.
>>I would have hoped that a simple
>> v.quick_push (gen_int_mode (0b01010101, QImode));
>>suffices but that will probably clash if there are more than
>>two npatterns.
No, we definitely can not use this. more details you can see the current test
vmerge-*.c .
We have various patterns:
E.g.
0, nunits + 1, nunits+ 2, ... it is 011
nunits, 1, 2 it 100.
....
Many different kinds of patterns can be used vmerge optimization.
[email protected]
From: Robin Dapp
Date: 2023-12-15 19:14
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm
vectorization
Hi Juzhe,
in general looks OK.
> + /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes. */
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len,
> 256)
> + && !vec_len.is_constant ())
> + return false;
> +
We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE
(GET_MODE_INNER (vmode)));
Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.
> /* MASK = SELECTOR < NUNTIS ? 1 : 0. */
Comment should go inside the if branch.
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len,
> 256))
> + }
> + else
> + {
> + /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> + directly to generate the selector mask, instead, we can only use
> + precomputed mask.
I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.
> +
> + E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> + don't have a QImode scalar register to hold larger than 255. */
We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.
Also add this as comment:
As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.
> + gcc_assert (vec_len.is_constant ());
> + int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> + machine_mode mode = get_vector_mode (QImode, size).require ();
> + rtx tmp = gen_reg_rtx (mode);
> + rvv_builder v (mode, 1, size);
> + for (int i = 0; i < vec_len.to_constant () / 8; i++)
> + {
> + uint8_t value = 0;
> + for (int j = 0; j < 8; j++)
> + {
> + int index = i * 8 + j;
> + if (known_lt (d->perm[index], 256))
> + value |= 1 << j;
> + }
I would have hoped that a simple
v.quick_push (gen_int_mode (0b01010101, QImode));
suffices but that will probably clash if there are more than
two npatterns.
Regards
Robin