Jeff Law <[email protected]> writes:
> On 11/8/23 23:08, [email protected] wrote:
>> From: Pan Li <[email protected]>
>>
>> Update in v2:
>> * Move vector type support to get_stored_val.
>>
>> Original log:
>>
>> This patch would like to allow the vector mode in the
>> get_stored_val in the DSE. It is valid for the read
>> rtx if and only if the read bitsize is less than the
>> stored bitsize.
>>
>> Given below example code with
>> --param=riscv-autovec-preference=fixed-vlmax.
>>
>> vuint8m1_t test () {
>> uint8_t arr[32] = {
>> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
>> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
>> };
>>
>> return __riscv_vle8_v_u8m1(arr, 32);
>> }
>>
>> Before this patch:
>> test:
>> lui a5,%hi(.LANCHOR0)
>> addi sp,sp,-32
>> addi a5,a5,%lo(.LANCHOR0)
>> li a3,32
>> vl2re64.v v2,0(a5)
>> vsetvli zero,a3,e8,m1,ta,ma
>> vs2r.v v2,0(sp) <== Unnecessary store to stack
>> vle8.v v1,0(sp) <== Ditto
>> vs1r.v v1,0(a0)
>> addi sp,sp,32
>> jr ra
>>
>> After this patch:
>> test:
>> lui a5,%hi(.LANCHOR0)
>> addi a5,a5,%lo(.LANCHOR0)
>> li a4,32
>> addi sp,sp,-32
>> vsetvli zero,a4,e8,m1,ta,ma
>> vle8.v v1,0(a5)
>> vs1r.v v1,0(a0)
>> addi sp,sp,32
>> jr ra
>>
>> Below tests are passed within this patch:
>>
>> * The x86 bootstrap and regression test.
>> * The aarch64 regression test.
>> * The risc-v regression test.
>>
>> PR target/111720
>>
>> gcc/ChangeLog:
>>
>> * dse.cc (get_stored_val): Allow vector mode if the read
>> bitsize is less than stored bitsize.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rvv/base/pr111720-0.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-1.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-10.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-2.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-3.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-4.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-5.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-6.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-7.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-8.c: New test.
>> * gcc.target/riscv/rvv/base/pr111720-9.c: New test.
> We're always getting the lowpart here AFAICT and it appears that all the
> right thing should happen if gen_lowpart_common fails (it returns NULL,
> which bubbles up and is the right return value from get_stored_val if it
> can't be optimized).
Yeah, we should always be operating on the lowpart, but it looks
like there's a latent bug. This check:
if (gap.is_constant () && maybe_ne (gap, 0))
{
...
}
else ...
means that we ignore the gap if it's a nonzero runtime value.
I guess it should be:
if (maybe_ne (gap, 0))
{
if (!gap.is_constant ())
return NULL_RTX;
...
}
instead. Alternatively, we could remove the is_constant condition
and fix PR87815 in a different way, e.g. by protecting the
smallest_int_mode_for_size with a tighter condition. That might
allow a similar DSE optimisation to this patch for nonzero offsets,
thanks to:
if (multiple_p (shift, GET_MODE_BITSIZE (new_mode))
&& known_le (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode)))
{
/* Try to implement the shift using a subreg. */
...
> Did you want to use known_le so that you'd pick up the case when the two
> modes are the same size? Or was known_lt the test you really wanted
> (and if so, why).
Agree it should be known_le FWIW.
Thanks,
Richard