> On Fri, Apr 18, 2014 at 12:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > >> What I've observed on power is that LTO alone reduces performance and > >> LTO+FDO is not significantly different than FDO alone. > > On SPEC2k6? > > > > This is quite surprising, for our (well SUSE's) spec testers (AMD64) LTO > > seems > > off-noise win on SPEC2k6 > > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-2006/recent.html > > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html > > > > I do not see why PPC should be significantly more constrained by register > > pressure. > > > > I do not have head to head comparsion of FDO and FDO+LTO for SPEC > > http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006-patched-FDO/index.html > > shows noticeable drop in calculix and gamess. > > Martin profiled calculix and tracked it down to a loop that is not trained > > but hot in the reference run. That makes it optimized for size. > > > > http://dromaeo.com/?id=219677,219672,219965,219877 > > compares Firefox's dromaeo runs with default build, LTO, FDO and LTO+FDO > > Here the benefits of LTO and FDO seems to add up nicely. > >> > >> I agree that an exact estimate of the register pressure would be a > >> difficult problem. I'm hoping that something that approximates potential > >> register pressure downstream will be sufficient to help inlining > >> decisions. > > > > Yep, register pressure and I-cache overhead estimates are used for inline > > decisions by some compilers. > > > > I am mostly concerned about the metric suffering from GIGO principe if we > > mix > > together too many estimates that are somehwat wrong by their nature. This is > > why I mostly tried to focus on size/time estimates and not add too many > > other > > metrics. But perhaps it is a time to experiment wit these, since obviously > > we > > pushed current infrastructure to mostly to its limits. > > > > I like the word GIGO here. Getting inline signals right requires deep > analysis (including interprocedural analysis). Different signals/hints > may also come with different quality thus different weights. > > Another challenge is how to quantify cycle savings/overhead more > precisely. With that, we can abandon the threshold based scheme -- any > callsite with a net saving will be considered.
Inline hints are intended to do this - at the moment we bump the limits up when we estimate big speedups for the inlining and with today patch and FDO we bypass the thresholds when we know from FDO that call matters. Concerning your other email, indeed we should consider heavy callees (in Open64 terminology) that consume a lot of time and do not skip the call sites. Easy way would be to replace maybe_hot_edge predicate by maybe_hot_call that simply multiplies the count and estimated time. (We probably gouth to get rid of the time capping and use wider arithmetics too). I wonder if that is not too local and if we should not try to estimate cumulative time of the function and get more agressive on inlining over the whole path leading to hot code. Honza