On Tue, Oct 16, 2012 at 7:20 PM, Andi Kleen wrote: > Vladimir Makarov <vmaka...@redhat.com> writes: >>> >> As I remember, the performance improvement from this optimization was >> very small. There were problems in reviewing IRA and I decided to >> simplify this task. >> >> May be it is worth to return to this work. > > ... especially if you could make it work with LTO.
I'm going to (or more accurately: have started looking at) porting Vlad's patch to the trunk. I'll probably have this finished next weekend. I've been thinking a lot about how this IPA-RA could be made more effective with LTO, but I don't see any easy ways to do this. Vlad's patch basically looks at what registers are really call-used and call-clobbered in the 'final' pass where assembly is emitted. This is the only way to know which registers will be in that set for each function, you can't already know that when the function is still represented in GIMPLE or non-strict RTL. LTO works on GIMPLE function bodies and summaries are written well before RTL is generated, registers have not been allocated. With "basic" LTO there's no problem. Functions have to be compiled to RTL in topological order to make things work, but as far as I know this already happens (not sure, tho'). With WHOPR (which is IMHO the only useful LTO mode in general) , the set of call used/clobbered regs cannot be known and streamed out as a summary for WHOPR because GCC streams GIMPLE bodies, performs its WHOPR magic on GIMPLE, and generates RTL only after WHOPR is done. I suppose it's theoretically possible to make a good initial guess of what registers might be not-clobbered by a function even if the ABI says so. For instance, perhaps it's possible to assume that a function that doesn't touch any variables in a floating point mode also doesn't use/clobber any floating point registers. This assumption could be propagated via LTO/WHOPR. If the function turns out to clobber registers that were assumed to be untouched, you could just save and restore them in the function ("callee saved" so to speak). But I don't know how useful that would be. Another, IMHO more interesting, thing to investigate would be allocating global variables to registers. This is not part of Vlad's original patch and I have no real ideas right now how to do that, but it would be an interesting optimization for single-thread programs (like GCC itself). Ciao! Steven