On Wed, May 20, 2015 at 5:10 AM, Michael Matz <m...@suse.de> wrote: > Hi, > > On Tue, 19 May 2015, Richard Henderson wrote: > >> It is. The relaxation that HJ is working on requires that the reads >> from the got not be hoisted. I'm not especially convinced that what >> he's working on is a win. >> >> With LTO, the compiler can do the same job that he's attempting in the >> linker, without an extra nop. Without LTO, leaving it to the linker >> means that you can't hoist the load and hide the memory latency. > > Well, hoisting always needs a register, and if hoisted out of a loop > (which you all seem to be after) that register is live through the whole > loop body. You need a register for each different called function in such > loop, trading the one GOT pointer with N other registers. For > register-starved machines this is a real problem, even x86-64 doesn't have > that many. I.e. I'm not convinced that this hoisting will really be much > of a win that often, outside toy examples. Sure, the compiler can hoist > function addresses trivially, but I think it will lead to spilling more > often than not, or alternatively the hoisting will be undone by the > register allocators rematerialization. Of course, this would have to be > measured for real not hand-waved, but, well, I'd be surprised if it's not > so. >
We should replace "call/jmp *foo@GOTPCREL(%rip)" with "call/jmp *foo@GOTRELAX(%rip)". As an option, we apply -fno-plt to both PIC and non-PIC codes, if foo is externally defined. It will save one indirect branch if GCC is right. If GCC is wrong and foo is defined locally, we get a nop prefix/suffix. We have nothing to lose. -- H.J.