Jean Christophe Beyler writes:
> As we can see, all three are using the symbol_ref data before adding
> their offset. But after cse, we get this:
>
> (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
> (const:DI (plus:DI (symbol_ref:DI ("data") )
> (const_int 8 [0x8] 71 {movdi_
The subreg pass has this :
(insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 8 [0x8] 71 {movdi_internal} (nil))
(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
(symbol_ref:DI ("data") )) 71
{movdi_internal} (nil))
...
Jean Christophe Beyler writes:
> uint64_t foo (void)
> {
> return data[0] + data[1] + data[2];
> }
>
> And this generates :
>
> la r9,data
> la r7,data+8
> ldd r6,0(r7)
> ldd r8,0(r9)
> ldd r7,16(r9)
>
> I'm trying to see if there is a problem with my rtx costs function
>
Ah ok, so I can see why it would not be able to perform that
optimization around the loop but I changed the code to simply have
this:
uint64_t foo (void)
{
return data[0] + data[1] + data[2];
}
And this generates :
la r9,data
la r7,data+8
ldd r6,0(r7)
ldd r8,0(r9)
ldd r
As you can see, the compiler uses r9 to store data and then uses that
for data[0] but also loads in r7 data+8 instead of directly using r9.
If I remove the loop then it does not do this.
This optimization is done by CSE only, currently. That's why it cannot
look through loops.
Paolo