On Tue, May 19, 2015 at 11:59:00AM -0700, Richard Henderson wrote: > On 05/19/2015 11:06 AM, Rich Felker wrote: > > I'm still mildly worried that concerns for supporting > > relaxation might lead to decisions not to optimize code in ways that > > would be difficult to relax (e.g. certain types of address load > > reordering or hoisting) but I don't understand GCC internals > > sufficiently to know if this concern is warranted or not. > > It is. The relaxation that HJ is working on requires that the reads from the > got not be hoisted. I'm not especially convinced that what he's working on is > a win.
Well as long as -fno-plt actually generates a load from the GOT like what would be done for data access, and does not go out of its way to produce something compatible with relaxation, my hope is that it would not affected by the pessimization. I'm not sure if that's the case though. > With LTO, the compiler can do the same job that he's attempting in the linker, > without an extra nop. Without LTO, leaving it to the linker means that you > can't hoist the load and hide the memory latency. Yes, this is my feeling too. Alexander Monakov have been discussing it on #musl a bit and I think the conclusion we reached is that relaxation is possibly a significant real-world win for non-PIC main executables, where it's very likely that addresses will be resolved at ld-time and for the programmer not to specifically annotate this with protected visibility. In such a case, you get either a direct call or a direct address load and indirect call, rather than hitting an extra cache line in the PLT thunk to do the address load and indirect call. Note that, being non-PIC, there is no GOT register involved here. > > I would still like to see the @GOTPCREL stuff added and used instead > > of @GOT, as I mentioned earlier in the thread, but I agree that's > > independent of relaxation support and shouldn't block it. > > I don't think that @GOTPCREL for 32-bit is a good idea. This is the scheme > that Darwin uses, so we do have some experience with it. > > In order for it to work you've got to have a pointer to a random address in > the > function. It means that you can only "easily" compute the address once. If > you need the value again you wind up with the same "extra" addl insn that we > have with the current GOT pointer. Why would you recompute it (this requires a fairly expensive call that reads or pops its own return address) rather than simply spilling the already-computed value and reloading it from the stack? The only example I can think of where it might make sense is when you don't want to load the address unconditionally because there are shrink-wrappable code paths that don't need it, but multple code paths that do, in which case they would each load different values. Is this the concern you have in mind? > We've just started to do inter-function register allocation. The next step > along those lines is to share the computation of GOT between multiple > functions. At which point it really helps to have one global base address to > talk about. I see -- that would be another case where it simplifies things. Rich