On 2020/1/7 23:40, Jan Hubicka wrote: >> >> >> On 2020/1/7 16:40, Jan Hubicka wrote: >>>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote: >>>>> Inline should return failure either (newsize > param_large_function_insns) >>>>> OR (newsize > limit). Sometimes newsize is larger than >>>>> param_large_function_insns, but smaller than limit, inline doesn't return >>>>> failure even if the new function is a large function. >>>>> Therefore, param_large_function_insns and param_large_function_growth >>>>> should be >>>>> OR instead of AND, otherwise --param large-function-growth won't >>>>> work correctly with --param large-function-insns. >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2020-01-06 Luo Xiong Hu <luo...@linux.ibm.com> >>>>> >>>>> ipa-inline-analysis.c (estimate_growth): Fix typo. >>>>> ipa-inline.c (caller_growth_limits): Use OR instead of AND. >>>> OK >>> >>> This patch also seems wrong. Inliner should return failure when >>> newsize > param_large_function_insns && newsize > limit >>> The reason is same as with large_unit_insns. We used to have only >>> growth limits but it has turned out that this gets too >>> restrictive/random for very small units/growths. >>> >>> So the logic is meant to be that inlining is OK either >>> - if function is reasonably small (large_function_insns limit is not >>> met) >>> - or it does not grow too much (large-function-growth isnot met) >> >> Sorry for checking too fast. Will revert once confirmed. > > Yes, i can confirm that the origina test was as intended and it is OK to > revert the patch (I am currently traveling)
Partially reverted in r279986 and remained the typo fix as it is obvious. > >> large_function_insns is default to 2700, and large-function-growth is >> default to 100, so inline a large function with insn 2700 to another large >> function with insn 2700 is allowed here (limit reach to 5400), I observed >> HUGE performance drop in some cases for this behavior(large function inline >> large function due to register spilling) compared with not inlined, that's >> why this patch comes out. >> >> From performance point of view,it seems benefitial to inline small function >> to >> large function, but negative when inline large function to large >> function(causing >> register spill)? 2700 seems too large for defining a function to be large >> and and >> 5400 also seems too big for growth limit? >> >> This inline happens at "Inlining one function called once", it's out of >> inline_small_functions and since newsize > param_large_function_insns, so I >> suppose it won't change behavior for inlining small functions? > > large-function-growth/insns is not meant to improve performance. The > reason why they exists is the fact that compiler is non-linar > in some cases and thus producing very large functions makes it slow / or > give up on some optimizations. > > The problem that sometimes inlining a larger function into a cold path > of other function leads to slowdown is rather old problem (see PR49194). > In general most of logic in inliner is about determining when inlining > is going to be useful. It is hard to tell when it is *not* going to be > useful, so I am bit unsure what to do about these cases in general. > > Recently I experimented path which disables inlining functions called > once when callee and caller frequency differ (i.e. one is hot other is > cold) etc, but it did not lead to consistent perormance improvements. > > What situation did you run into? Thanks. So caller could be {hot, cold} + {large, small}, same for callee. It may produce up to 4 * 4 = 16 combinations. Agree that hard to define useful, and useful really doesn't reflect performance improvements certainly. :) My case is A1(1) calls A2(2), A2(2) calls A3(3). A1, A2, A3 are specialized cloned nodes with different input, they are all hot, called once and each have about 1000+ insns. By default, large-function-growth/insns are both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower than no-inline. If adjust the large-function-growth/insns to allow A2 inline A3 only, the performance is 20% slower then no-inline. All the inlinings are generated in functions called once. Xionghu > > Honza >