https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
--- Comment #27 from Li Pan <pan2.li at intel dot com> ---
Hi Richard and Juzhe.
I investigated this issue recently and noticed that it may be related to the
array size of the constant memory. Assume we have 2 functions as below.
vuint8m1_t fn_00000 () {
uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};
return __riscv_vle8_v_u8m1(arr, 32);
}
vuint8m2_t fn_11111 () {
uint8_t arr[32] = {1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9, 1, 2, 7,
1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9};
return __riscv_vle8_v_u8m2(arr, 32);
}
The vuint8m1 will have stack variables but the vuint8m2 doesn't. Thus I guess
there may be some limitations when optimization. Finally, I located
extract_low_bits when get_stored_val in dse. Looks like it can only take care
of scalar mode if the nunits are not equal.
rtx extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
{
...
if (!int_mode_for_mode (src_mode).exists (&src_int_mode)
|| !int_mode_for_mode (mode).exists (&int_mode))
return NULL_RTX;
...
}
I try to allow the vector mode for the gen_lowpart here if and only if the size
of mode is not greater than src mode. It can eliminate the stack variables as
we expected up to a point for the above functions.
I tested RVV regression and looks good for now. But I would like to double
confirm with you that it is reasonable? Before we start to do more testing. ;).
Thanks.