https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121959
Bug ID: 121959
Summary: riscv: vector sign extend instead of zero extend
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
CC: pan2.li at intel dot com
Target Milestone: ---
Target: riscv
I haven't analyzed this in detail yet but figured I'll open a PR for tracking
purposes.
The following example, extracted from x264's satd, compiled with -O3
-march=rv64gcv
void
lul( int *restrict res, uint8_t *restrict a, uint8_t *restrict b, int n)
{
for (int i = 0; i < n; i++)
{
res[i] = (a[i] - b[i]) << 16;
}
}
results in
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vle8.v v1,0(a2)
vle8.v v3,0(a1)
slli a4,a5,2
sub a3,a3,a5
add a1,a1,a5
add a2,a2,a5
vwsubu.vv v2,v3,v1
vsetvli zero,zero,e32,m1,ta,ma
vsext.vf2 v1,v2
vsll.vi v1,v1,16
vse32.v v1,0(a0)
add a0,a0,a4
bne a3,zero,.L3
which is reasonable. LLVM, however produces:
...
vzext.vf2 v8, v10
vsll.vi v8, v8, 16
which can be combined into vwsll (vector widening shift left).
vwsll zero-extends so we cannot combine a sign-extend + left shift.
Left-shifting a negative number is undefined but I'm not sure we can
make use of that here.
.optimized:
vect__3.8_93 = .MASK_LEN_LOAD (vectp_a.6_90, 8B, { -1, ... }, _92(D), _112,
0);
vect_patt_31.9_94 = (vector([4,4]) unsigned short) vect__3.8_93;
vect__6.12_99 = .MASK_LEN_LOAD (vectp_b.10_96, 8B, { -1, ... }, _98(D), _112,
0);
vect_patt_29.13_100 = (vector([4,4]) unsigned short) vect__6.12_99;
vect_patt_27.14_101 = vect_patt_31.9_94 - vect_patt_29.13_100;
vect_patt_26.15_102 = VIEW_CONVERT_EXPR<vector([4,4]) signed
short>(vect_patt_27.14_101);
vect_patt_22.16_103 = (vector([4,4]) int) vect_patt_26.15_102;
vect__11.17_104 = vect_patt_22.16_103 << 16;