https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Segher Boessenkool <segher at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #4)
> Yes, the masking should be only performed at the end.
>
> However, the inner loop could be further simplified to
>
> label:
> lwzu r8,4(r10)
> add r3,r8,r3
> bdnz label
>
> without the need to do anything with r9, so this is probably
> more than one topic in one test case.
Please use -O2 instead, no one will care much about -O1. You can use
-fno-unroll-loops to make it easier to read.
The core for foo is
.L3:
lwzu 10,4(9)
add 3,10,3
rldicl 3,3,0,32
bdnz .L3
and for foo2 is
.L10:
lwzu 10,4(9)
add 3,3,10
bdnz .L10
This is this way in Gimple already: the IV is a DImode, while it would
be better as a SImode. That is the root of the problem here. Sinking
extensions could well help, but the IV should not be DImode in the first
place!
Confirmed.