Hi!

On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
> Currently on power10, the compiler compiles this as:
> 
>       ret_var:
>               pld 9,ext_variable@got@pcrel
>               lwa 3,0(9)
>               blr
> 
>       store_var:
>               pld 9,ext_variable@got@pcrel
>               stw 3,0(9)
>               blr
> 
> That is, it loads up the address of 'ext_variable' from the GOT table into
> register r9, and then uses r9 as a base register to reference the actual
> variable.
> 
> The linker does optimize the case where you are compiling the main program, 
> and
> the variable is also defined in the main program to be:
> 
>       ret_var:
>               pla     9,ext_variable,1
>               lwa     3,0(9)
>               blr
> 
>       store_var:
>               pla     9,ext_variable,1
>               stw     3,0(9)
>               blr

Those "pla" insns are invalid; please correct them?  (You mixed "pla"
and "paddi" syntax I think.)

> These patches generate:
> 
>       ret_var:
>               pld     9,ext_variable@got@pcrel
>       .Lpcrel1:
>               .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
>               lwa     3,0(9)
>               blr
> 
>       store_var:
>               pld     9,ext_variable@got@pcrel
>       .Lpcrel2:
>               .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
>               stw     3,0(9)
>               blr
> 
> Note, the label for locating the PLD occurs after the PLD and not before it.
> This is so that if the assembler adds a NOP in front of the PLD to align it,
> the relocations will still work.
> 
> If the linker can, it will convert the code into:
> 
>       ret_var:
>               plwa    3,ext_variable,1
>               nop
>               blr
> 
>       store_var:
>               pstw    3,ext_variable,1
>               nop
>               blr

Those "plwa" and "pstw" are invalid syntax as well (should have "(0)"
after the symbol name).

> These patches allow the load of the address to not be physically adjacent to
> the actual load or store, which should allow for better code.

Why is that?  That is not what it does anyway?  /confused

> In order to do this, the pass that converts the load address and load/store
> must occur late in the compilation cycle.

That does not follow afaics.

> In particular, the second scheduler
> pass will duplicate and optimize some of the references and it will produce an
> invalid program.  In the past, Segher has said that we should be able to move
> it earlier.

I said that you shouldn't require this to be the very last pass.  There
is no reason for that, and that will not scale (what if a second pass
shows up that also requires this!)

It also makes it impossible to do normal late optimisations on code
produced here (optimisations like peephole, cprop_hardreg, dce).

I also said that you should use the DF framework, not parse all RTL by
hand and getting it all wrong, as *everyone* does: this stuff is hard.


Segher

Reply via email to