https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Segher Boessenkool <segher at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> --- (In reply to Thomas Koenig from comment #4) > Yes, the masking should be only performed at the end. > > However, the inner loop could be further simplified to > > label: > lwzu r8,4(r10) > add r3,r8,r3 > bdnz label > > without the need to do anything with r9, so this is probably > more than one topic in one test case. Please use -O2 instead, no one will care much about -O1. You can use -fno-unroll-loops to make it easier to read. The core for foo is .L3: lwzu 10,4(9) add 3,10,3 rldicl 3,3,0,32 bdnz .L3 and for foo2 is .L10: lwzu 10,4(9) add 3,3,10 bdnz .L10 This is this way in Gimple already: the IV is a DImode, while it would be better as a SImode. That is the root of the problem here. Sinking extensions could well help, but the IV should not be DImode in the first place! Confirmed.