On 2020/1/7 23:40, Jan Hubicka wrote:
>>
>>
>> On 2020/1/7 16:40, Jan Hubicka wrote:
>>>> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote:
>>>>> Inline should return failure either (newsize > param_large_function_insns)
>>>>> OR (newsize > limit).  Sometimes newsize is larger than
>>>>> param_large_function_insns, but smaller than limit, inline doesn't return
>>>>> failure even if the new function is a large function.
>>>>> Therefore, param_large_function_insns and param_large_function_growth 
>>>>> should be
>>>>> OR instead of AND, otherwise --param large-function-growth won't
>>>>> work correctly with --param large-function-insns.
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>   2020-01-06  Luo Xiong Hu  <luo...@linux.ibm.com>
>>>>>
>>>>>   ipa-inline-analysis.c (estimate_growth): Fix typo.
>>>>>   ipa-inline.c (caller_growth_limits): Use OR instead of AND.
>>>> OK
>>>
>>> This patch also seems wrong.  Inliner should return failure when
>>>    newsize > param_large_function_insns && newsize > limit
>>> The reason is same as with large_unit_insns.  We used to have only
>>> growth limits but it has turned out that this gets too
>>> restrictive/random for very small units/growths.
>>>
>>> So the logic is meant to be that inlining is OK either
>>>    - if function is reasonably small (large_function_insns limit is not
>>>      met)
>>>    - or it does not grow too much (large-function-growth isnot met)
>>
>> Sorry for checking too fast.  Will revert once confirmed.
> 
> Yes, i can confirm that the origina test was as intended and it is OK to
> revert the patch (I am currently traveling)

Partially reverted in r279986 and remained the typo fix as it is obvious.

> 
>> large_function_insns is default to 2700, and large-function-growth is
>> default to 100, so inline a large function with insn 2700 to another large
>> function with insn 2700 is allowed here (limit reach to 5400), I observed
>> HUGE performance drop in some cases for this behavior(large function inline
>> large function due to register spilling) compared with not inlined, that's
>> why this patch comes out.
>>
>>  From performance point of view,it seems benefitial to inline small function 
>> to
>> large function, but negative when inline large function to large 
>> function(causing
>> register spill)?  2700 seems too large for defining a function to be large 
>> and and
>> 5400 also seems too big for growth limit?
>>
>> This inline happens at "Inlining one function called once", it's out of
>> inline_small_functions and since newsize > param_large_function_insns, so I
>> suppose it won't change behavior for inlining small functions?
> 
> large-function-growth/insns is not meant to improve performance. The
> reason why they exists is the fact that compiler is non-linar
> in some cases and thus producing very large functions makes it slow / or
> give up on some optimizations.
> 
> The problem that sometimes inlining a larger function into a cold path
> of other function leads to slowdown is rather old problem (see PR49194).
> In general most of logic in inliner is about determining when inlining
> is going to be useful. It is hard to tell when it is *not* going to be
> useful, so I am bit unsure what to do about these cases in general.
> 
> Recently I experimented path which disables inlining functions called
> once when callee and caller frequency differ (i.e. one is hot other is
> cold) etc, but it did not lead to consistent perormance improvements.
> 
> What situation did you run into?

Thanks.  So caller could be {hot, cold} + {large, small}, same for callee.  It 
may
produce up to 4 * 4 = 16 combinations.  Agree that hard to define useful,
and useful really doesn't reflect performance improvements certainly. :)

My case is A1(1) calls A2(2), A2(2) calls A3(3).  A1, A2, A3 are 
specialized cloned nodes with different input, they are all hot, called once
and each have about 1000+ insns.  By default, large-function-growth/insns are
both not reached, so A1 will inline A2, A2 will inline A3, which is 40% slower 
than no-inline.

If adjust the large-function-growth/insns to allow 
A2 inline A3 only, the performance is 20% slower then no-inline.
All the inlinings are generated in functions called once.


Xionghu

> 
> Honza
> 

Reply via email to