https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |juzhe.zhong at rivai dot ai
--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
For missed peephole optimization, I already noticed it long time ago,
and I have filed PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
Such issue will gone after Richard Standiford @arm merged late-combine PASS in
GCC 15.
Also, GCC support dynamic LMUL optimization with -mrvv-max-lmul=dynamic:
https://godbolt.org/z/646nYoKbv
ASM:
count_chars(char const*, unsigned long, char):
beq a1,zero,.L4
vsetvli a4,zero,e8,m1,ta,ma
vmv.v.x v1,a2
vsetvli zero,zero,e64,m8,ta,ma
vmv.v.i v8,0
.L3:
vsetvli a5,a1,e8,m1,ta,ma
vle8.v v0,0(a0)
sub a1,a1,a5
add a0,a0,a5
vmseq.vv v0,v0,v1
vsetvli zero,zero,e64,m8,tu,mu
vadd.vi v8,v8,1,v0.t
bne a1,zero,.L3
vsetvli a5,zero,e64,m8,ta,ma
li a4,0
vmv.s.x v1,a4
vredsum.vs v8,v8,v1
vmv.x.s a0,v8
ret
.L4:
li a0,0
ret
GCC picks LMUL = 8, since it doesn't cause additional register spillings
according to the program register pressure.