https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Hi, Andrew. I have another try: https://godbolt.org/z/heKxcMWsY change the load into normal load of arr: vuint8m1_t varr = *(vuint8m1_t*)arr; Like you said, The issue is gone (as good as LLVM): fn: lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) li a4,32 vl1re8.v v1,0(a5) vsetvli zero,a4,e8,m1,ta,ma vand.vi v1,v1,1 vs1r.v v1,0(a0) ret It seems that GCC can only optimize the normal load ? Do we have a chance to optimize such case (for an unknown load) ?