On 22/11/2019 10:55, Arnd Bergmann wrote:
> On Fri, Nov 22, 2019 at 2:40 PM Adhemerval Zanella
> <adhemerval.zane...@linaro.org> wrote:
>>
>> Hi Arnd,
>>
>> I took a look on the stack usage issue in the kernel snippet you provided 
>> [1],
>> and as you have noted the most impact indeed come from -ftree-ch 
>> optimization.
>> It is enabled in all optimization levels besides -Os (since besides possible
>> increasing the stack usage it also might increase code side).
>>
>> I am still fulling grasping what free-ch optimization does, but my 
>> understanding
>> so far is it tries to reorganize the loop for later loop optimization phases.
>> More specifically, what it ends up doing on the specific snippet is create 
>> extra
>> stack variables for the internal membber access in the inner loop (which in 
>> its
>> turns increase stack usage).
> 
> Thanks a lot for taking a detailed look!
> 
>>
>> This is also why adding the compiler barrier inhibits the optimization, 
>> since it
>> prevents the ftree-ch to optimize the internal loop reorganization and it is
>> passed as is to later optimizations phases.
>>
>> It is also a generic pass that affects all architecture, albeit the resulting
>> stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
>> usage using -fstack-usage along with -O2:
>>
>> arm                     632
>> aarch64                 448
>> powerpc                 912
>> powerpc64le             560
>> s390                    600
>> s390x                   632
>> i386                    1376
>> x86_64                  784
>>
>> Also, -fconserve-stack does not really help with this pass since ftree-ch 
>> does
>> not check the flag usage.  The fconserve-stack currently only seems to effect
>> the inliner by setting both large-stack-frame and large-stack-frame-growth to
>> some conservative values.
>>
>> The straightforward change I am checking is just to disable tree-ch 
>> optimization
>> if fconserve-stack is also enabled:
>>
>> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
>> index b894a7e0918..b14dd66257c 100644
>> --- a/gcc/tree-ssa-loop-ch.c
>> +++ b/gcc/tree-ssa-loop-ch.c
>> @@ -291,7 +291,8 @@ public:
>>    {}
>>
>>    /* opt_pass methods: */
>> -  virtual bool gate (function *) { return flag_tree_ch != 0; }
>> +  virtual bool gate (function *) { return flag_tree_ch != 0
>> +                                         && flag_conserve_stack == 0; }
>>
>>    /* Initialize and finalize loop structures, copying headers inbetween.  */
>>    virtual unsigned int execute (function *);
> 
> That assumes that ftree-ch generally results in higher stack usage,
> which is something we would have to confirm first. I've done
> similar checks before on other options, basically building a large
> project like the kernel with -Wframe-larger-than=128 (or similar),
> and then comparing the warning output with/without that flag.
> 
> That would tell us whether this is a systematic problem with
> -ftree-ch (making your patch a good idea) or whether the example
> code just hit a worst case that is otherwise rare, and turning off
> -ftree-ch generally just leads to worse output but no lower stack
> usage.

Yes, it is a big hammer and I am trying to check if I can get an
estimate stack usage to check against param-large-stack-frame 
(set by fconserve-stack) as gcc-git/gcc/ipa-inline.c does for the
inliner.

The idea is to keep free-ch enabled unless it hit some stack
usage by the transformation.

> 
> One suspicion I have is that this is related to not only having
> a large struct, but also having lots of 64-bit members in that
> struct and working on it on a 32-bit architecture.

This is most likely increase the stack usage, changing the u64
definition on snippet to use 'unsigned long' I am seeing:

$ x86_64-glibc-linux-gnu-gcc -v -O2 -Wframe-larger-than=100  
-Wa,--fatal-warnings stack_usage.c -fstack-usage -c -m32; cat stack_usage.su
stack_usage.c:158:6:mlx5e_grp_sw_update_stats   472     static

From previous 1376 usage.  However by disabling ftree-ch:

$ x86_64-glibc-linux-gnu-gcc -O2 -Wframe-larger-than=100  -Wa,--fatal-warnings 
stack_usage.c -fstack-usage -c -m32  -fno-tree-ch; cat stack_usage.su
stack_usage.c:158:6:mlx5e_grp_sw_update_stats   16      static

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to