Hi! On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote: > Currently on power10, the compiler compiles this as: > > ret_var: > pld 9,ext_variable@got@pcrel > lwa 3,0(9) > blr > > store_var: > pld 9,ext_variable@got@pcrel > stw 3,0(9) > blr > > That is, it loads up the address of 'ext_variable' from the GOT table into > register r9, and then uses r9 as a base register to reference the actual > variable. > > The linker does optimize the case where you are compiling the main program, > and > the variable is also defined in the main program to be: > > ret_var: > pla 9,ext_variable,1 > lwa 3,0(9) > blr > > store_var: > pla 9,ext_variable,1 > stw 3,0(9) > blr
Those "pla" insns are invalid; please correct them? (You mixed "pla" and "paddi" syntax I think.) > These patches generate: > > ret_var: > pld 9,ext_variable@got@pcrel > .Lpcrel1: > .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) > lwa 3,0(9) > blr > > store_var: > pld 9,ext_variable@got@pcrel > .Lpcrel2: > .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8) > stw 3,0(9) > blr > > Note, the label for locating the PLD occurs after the PLD and not before it. > This is so that if the assembler adds a NOP in front of the PLD to align it, > the relocations will still work. > > If the linker can, it will convert the code into: > > ret_var: > plwa 3,ext_variable,1 > nop > blr > > store_var: > pstw 3,ext_variable,1 > nop > blr Those "plwa" and "pstw" are invalid syntax as well (should have "(0)" after the symbol name). > These patches allow the load of the address to not be physically adjacent to > the actual load or store, which should allow for better code. Why is that? That is not what it does anyway? /confused > In order to do this, the pass that converts the load address and load/store > must occur late in the compilation cycle. That does not follow afaics. > In particular, the second scheduler > pass will duplicate and optimize some of the references and it will produce an > invalid program. In the past, Segher has said that we should be able to move > it earlier. I said that you shouldn't require this to be the very last pass. There is no reason for that, and that will not scale (what if a second pass shows up that also requires this!) It also makes it impossible to do normal late optimisations on code produced here (optimisations like peephole, cprop_hardreg, dce). I also said that you should use the DF framework, not parse all RTL by hand and getting it all wrong, as *everyone* does: this stuff is hard. Segher