> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57018 > > and because LRA still misses some reload functionality for > elimination. I am a bit embarrassed: I have this thing to do for 4 > months and I still did not start to work on it yet. There are too > much things on my plate. > > As we are going to use outgoing arg accumulation, this problem is > becoming higher priority one.
Thank you, we currently use outgoing arg accumulation always on x86_64, I plan to re-disable arg accumulation on CPUs that handle push/pop well (i.e. have stack engine). This brings nice code size savings. I wonder how much this actually comes from not omitting frame pointer in non-leaf functions with IRA. EBP based addressing is more compact than ESP and thus -fomit-frame-pointer is disabled with -Os. Perhaps frame elimination can be actually decided on by register allocation? On similar note I just benchmarked -mfpmath=sse for 32bit code and it is quite big performance win and again causes about 5% code size regression. I want to propose defaulting to -mfpmath=sse for 32bit for -ffast-math and -Ofast. (in a way I would like to see -mfpmath=sse by default for 32bit on CPUs supporting SSE2, but that has been voted down long time ago becuase it loses the 80bit precision for temporaries in double/float computations). I wonder if we can eventually make -mfpmath=sse,387 working well (I did not bechmark it yet, but statically it still produces more spiling than -mfpmath=sse) and/or if we can possibly decide on fpmath based on hotness of function (at least with profile around). Thanks for all the hard work on IRA! Honza