On Sun, Apr 17, 2011 at 10:35 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> AFAICT revision 172430 fixed the original problem in pr45810: >> >> gfc -Ofast -fwhole-program fatigue.f90 : 6.301u 0.003s 0:06.30 >> gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26 >> >> However if I play with --param max-inline-insns-auto=*, I get >> >> gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays >> fatigue.f90 : 4.870u 0.002s 0:04.87 >> gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays >> fatigue.f90 : 2.872u 0.002s 0:02.87 >> >> and >> >> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 >> -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97 >> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 >> -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73 >> >> while I get the same threshold=125 with/without -flto at revision 172429. >> Note that I get the same thresholds without -fstack-arrays, the run times >> are only larger. > > Thanks for notice. This was not really expected, but seems to give some > insight. I just tested a new cleanup patch of mine where I fixed few minor > bugs in side corners. One of those bugs I noticed was introduced by this > patch > (an overlook while converting the code to new accesor). > > In case of nested inlining, the stack usage got misaccounted and consequently > we allowed more inlining than --param large-stack-frame-growth would allow > normally. > The vortex and wupwise improvement seems to be gone, so I think they are due > to this > issue. > > I never really tuned the stack frame growth heuristics since it did not cause > any problems > in the benchmarks. On fortran this is quite different because of the large > i/o blocks > hitting it very commonly, so I will look into making it more permissive. We > definitely > can just bump up the limits and/or we can also teach it that if call > dominates the return > there is not really much to save of stack usage by preventing inlining since > both stack > frames will wind up on the stack anyway.
I think Micha has a fix for the I/O block issue. Richard. > This means adding new bit whether call edge dominate exit and using this > info. Also simple > noreturn IPA discovery can be based on this and I recently noticed it might > be important > for Mozilla. So I will give it a try soonish. > > I will also look into the estimate_size ICE reported today. > > Honza >