https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Yes, the masking should be only performed at the end. However, the inner loop could be further simplified to label: lwzu r8,4(r10) add r3,r8,r3 bdnz label without the need to do anything with r9, so this is probably more than one topic in one test case.