http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194
--- Comment #6 from Linus Torvalds <torva...@linux-foundation.org> 2011-05-27 16:38:22 UTC --- (In reply to comment #3) > > -finline-functions-called-once is trottled down by the large-function-growth > and large-stack-frame-growth limits. The Kernel case coupld proably be > handled > by the second. Does kernel bump down that limits? We used to play with inlining limits (gcc had some really bad decisions), but the meaning of the numbers kept changing from one gcc version to another, and the heuristics gcc used kept changing too. Which made it practically impossible to use sanely - you could tweak it for one particular architecture, and one particular version of gcc, but it would then be worse for others. Quite frankly, with that kind of history, I'm not very eager to start playing around with random gcc internal variables again. So I'd much rather have gcc have good heuristics by default, possibly helped by the kinds of obvious hints we can give ("unlikely()" in particular is something we can add for things like this). Obviously, we can (and do) use the "force the decision" with either "noinline" or "__always_inline" (which are just the kernel macros to make the gcc attribute syntax slightly more readable), but since I've been doing those other bug reports about bad gcc code generation, I thought I'd point out this one too. > It still won't help in case function doesn't have any on-stack aggregates, > since we optimistically assume that all gimple registers will disappear. > Probably > even that could be change, though estimating reload's stack frame usage so > early would > be iffy. Yes, early stack estimation might not work all that well. In the kernel, we do end up having a few complex functions that we basically expect to inline to almost nothing - simply because we end up depending on compile-time constant issues (sometimes very explicitly, with __builtin_constant_p() followed by a largish "switch ()" statement). That said, this is something where the call-site really can make a big difference. Not just the fact that the call site might be marked "unlikely()" (again, that's just the kernel making __builtin_expect() readable), but things like "none of the arguments are constants" could easily be a good heuristic to use as a basis for whether to inline or not. IOW, start out with whatever 'large-stack-frame-growth' and 'large-function-growth' values, but if the call-site is in an unlikely region, cut those values in half (or whatever). And if none of the arguments are constants, cut it in half again. This is an example of why giving these limits as compiler options really doesn't work: the choice should probably be much more dynamic than just a single number. I dunno. As mentioned, we can fix this problem by just marking things noinline by hand. But I do think that there are fairly obvious cases where inlining really isn't worth it, and gcc might as well just get those cases right.