https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
--- Comment #5 from Kito Cheng <kito at gcc dot gnu.org> ---
Assume:
VLEN = 128 and n = 5, *in is {0, 0, 0, 0, 0}
so VLMAX = 4 for e32m1
It can be run with vl = 4 for first iteration, and vl = 1 vl for second
iteration
But it could be something like that: vl = 3 for first iteration and vl = 2 for
second iteration, ok, let run the code with that:
foo(int, int*, int*):
ble a0,zero,.L5
csrr a5,vlenb
srli a5,a5,2
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v4,a5 # v4 = {4, 4, 4, 4}
vid.v v2 # v2 = {0, 1, 2, 3}
.L3:
vsetvli a5,a0,e32,m1,ta,ma # first iteration got vl = 3
slli a4,a5,2
vle32.v v1,0(a1) # v1 = {0, 0, 0}
sub a0,a0,a5
vadd.vv v1,v1,v2 # v1 = {0, 0, 0} + {0, 1, 2}
vse32.v v1,0(a2) # out = {0, 1, 2, 0, 0}
add a1,a1,a4
vsetvli a5,zero,e32,m1,ta,ma
add a2,a2,a4
vadd.vv v2,v2,v4 # v2 = {0, 1, 2, 3} + {4, 4, 4, 4}
# = {4, 5, 6, 7}
bne a0,zero,.L3
.L5:
ret
Ok, let run second iteration:
.L3:
vsetvli a5,a0,e32,m1,ta,ma # first iteration got vl = 2
slli a4,a5,2
vle32.v v1,0(a1) # v1 = {0, 0}
sub a0,a0,a5
vadd.vv v1,v1,v2 # v1 = {0, 0} + {4, 5}
vse32.v v1,0(a2) # out = {0, 1, 2, 4, 5}
add a1,a1,a4
vsetvli a5,zero,e32,m1,ta,ma
add a2,a2,a4
vadd.vv v2,v2,v4 # v2 = {4, 5, 6, 7} + {4, 4, 4, 4}
# = {8, 9, 10, 11}
bne a0,zero,.L3
And the you will got {0, 1, 2, 4, 5} rather than {0, 1, 2, 3, 4}