https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
--- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Richard and Richi.
I found a way to simulate "undefine" in COND_LEN_xxx pattern for the ELSE value
that doesn't matter.
First, return size type 0 in else_value target hook:
/* Use size_type 0 which is represented as const0_rtx in RTL to simulate
undefine else value since GCC doesn't undefine value in TREE/GIMPLE
representation.
TODO: We may will need to support undefine value in TREE/GIMPLE middle-end
IR. But current approach is good enough for RVV codegen/performance. */
static tree
riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops,
tree *ops)
{
if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
return build_zero_cst (size_type_node);
return default_preferred_else_value (ifn, vectype, nops, ops);
}
Note that we can't return VECTOR_CST with all 0.
Since a VECTROR_CST with all 0 may matter and the real value we need.
So, to simulate "undefine", I pass a '0' which will be represented as
const0_rtx in RTX.
So the IR will be:
vect__7.12_8 = .COND_LEN_DIV ({ -1, ... }, vect__4.8_22, vect__6.11_9, 0
(undefine ELSE value), _37, 0);
Then I relax the predicate in COND_LEN_xxx pattern. It works and pass all
the tests.
Consider this following case:
void
foo (int32_t *__restrict a, int32_t *__restrict b, int n)
{
for (int i = 0; i < n; i++)
a[i] = a[i] / b[i];
}
Before:
foo:
ble a2,zero,.L5
mv a4,a0
vsetvli a5,zero,e32,m8,ta,ma
vmv.v.i v4,0
.L3:
vsetvli a5,a2,e32,m8,tu,ma
vmv8r.v v1,v4
slli a3,a5,2
vle32.v v3,0(a0)
vle32.v v2,0(a1)
sub a2,a2,a5
vdiv.vv v1,v3,v2
vse32.v v1,0(a4)
add a0,a0,a3
add a1,a1,a3
add a4,a4,a3
bne a2,zero,.L3
.L5:
ret
After:
foo:
ble a2,zero,.L5
mv a4,a0
.L3:
vsetvli a5,a2,e32,m8,ta,ma
slli a3,a5,2
vle32.v v8,0(a0)
vle32.v v16,0(a1)
sub a2,a2,a5
vdiv.vv v8,v8,v16
vse32.v v8,0(a4)
add a0,a0,a3
add a1,a1,a3
add a4,a4,a3
bne a2,zero,.L3
.L5:
ret
Not so elegant. But it does fix the performance/codegen issue in RVV.