https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Andrew.

I have another try:

https://godbolt.org/z/heKxcMWsY

change the load into normal load of arr:
vuint8m1_t varr = *(vuint8m1_t*)arr;

Like you said,

The issue is gone (as good as LLVM):
fn:
        lui     a5,%hi(.LANCHOR0)
        addi    a5,a5,%lo(.LANCHOR0)
        li      a4,32
        vl1re8.v        v1,0(a5)
        vsetvli zero,a4,e8,m1,ta,ma
        vand.vi v1,v1,1
        vs1r.v  v1,0(a0)
        ret

It seems that GCC can only optimize the normal load ?

Do we have a chance to optimize such case (for an unknown load) ?

Reply via email to