I should say that you need all three options to prevent the code motion from 
happening:
-fno-tree-loop-im -fno-tree-pre -fno-gcse

-fno-tree-ch prevents the code motion from happening too but only on accident; 
in that all three of the code motion passes (the two on the gimple and one on 
RTL) won't work with the loop in that form.  Disabling copy header optimization 
for flag_conserve_stack is the wrong approach.  Again you need to look into 
each of the code motion passes to understand the register pressure heuristics 
and why they do the code motion.

Also I have not looked into why the RTL loop invariant code motion pass did 
NOTHING here.

Thanks,
Andrew Pinski

________________________________________
From: linaro-toolchain <linaro-toolchain-boun...@lists.linaro.org> on behalf of 
Andrew Pinski <apin...@marvell.com>
Sent: Friday, November 22, 2019 6:44 AM
To: Adhemerval Zanella; Arnd Bergmann
Cc: Linaro Toolchain Mailman List
Subject: Re: [EXT] High stack usage due ftree-ch

>Thanks for the information, at least for the specific snippet it seems that
> both -fno-tree-loop-im and -fno-tree-pre are the ones generating most 
> spilling.
That is because the code motion is happening at the RTL level: -fno-gcse is the 
one you are looking for.

________________________________________
From: Adhemerval Zanella <adhemerval.zane...@linaro.org>
Sent: Friday, November 22, 2019 6:41 AM
To: Andrew Pinski; Arnd Bergmann
Cc: Linaro Toolchain Mailman List
Subject: Re: [EXT] High stack usage due ftree-ch



On 22/11/2019 11:38, Andrew Pinski wrote:
>> It is enabled in all optimization levels besides -Os (since besides possible
>> increasing the stack usage it also might increase code side).
> It is disabled at -Os because it is duplicating the loop header; which in 
> turn is considered increasing code size (though sometimes that can have a 
> side effect of decreasing the code size later on but that is a different 
> story).
> The increase of stack usage is due to register pressure with respect to other 
> optimizations that can now work with the copied loop header.  If anything, 
> the register pressure heuristics needs improvement for code motion passes or 
> the ability to undo those code motion while doing register allocation.  THIS 
> IS a HUGE project and should not be taken lightly.  It just happens this code 
> happens here and causes issues.  It is not the normal case really.
>

Thanks for the information, at least for the specific snippet it seems that
both -fno-tree-loop-im and -fno-tree-pre are the ones generating most spilling.

So the question I have it is worth to disable -free-ch when -fstack-conserve
is set (since it the flag idea to prevent such pessimizations) or the idea is
just to disable -ftree-ch for such cases.

> Thanks,
> Andrew
>
> ________________________________________
> From: linaro-toolchain <linaro-toolchain-boun...@lists.linaro.org> on behalf 
> of Adhemerval Zanella <adhemerval.zane...@linaro.org>
> Sent: Friday, November 22, 2019 5:40 AM
> To: Arnd Bergmann
> Cc: Linaro Toolchain Mailman List
> Subject: [EXT] High stack usage due ftree-ch
>
> External Email
>
> ----------------------------------------------------------------------
> Hi Arnd,
>
> I took a look on the stack usage issue in the kernel snippet you provided [1],
> and as you have noted the most impact indeed come from -ftree-ch optimization.
> It is enabled in all optimization levels besides -Os (since besides possible
> increasing the stack usage it also might increase code side).
>
> I am still fulling grasping what free-ch optimization does, but my 
> understanding
> so far is it tries to reorganize the loop for later loop optimization phases.
> More specifically, what it ends up doing on the specific snippet is create 
> extra
> stack variables for the internal membber access in the inner loop (which in 
> its
> turns increase stack usage).
>
> This is also why adding the compiler barrier inhibits the optimization, since 
> it
> prevents the ftree-ch to optimize the internal loop reorganization and it is
> passed as is to later optimizations phases.
>
> It is also a generic pass that affects all architecture, albeit the resulting
> stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
> usage using -fstack-usage along with -O2:
>
> arm                     632
> aarch64                 448
> powerpc                 912
> powerpc64le             560
> s390                    600
> s390x                   632
> i386                    1376
> x86_64                  784
>
> Also, -fconserve-stack does not really help with this pass since ftree-ch does
> not check the flag usage.  The fconserve-stack currently only seems to effect
> the inliner by setting both large-stack-frame and large-stack-frame-growth to
> some conservative values.
>
> The straightforward change I am checking is just to disable tree-ch 
> optimization
> if fconserve-stack is also enabled:
>
> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
> index b894a7e0918..b14dd66257c 100644
> --- a/gcc/tree-ssa-loop-ch.c
> +++ b/gcc/tree-ssa-loop-ch.c
> @@ -291,7 +291,8 @@ public:
>    {}
>
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return flag_tree_ch != 0; }
> +  virtual bool gate (function *) { return flag_tree_ch != 0
> +                                         && flag_conserve_stack == 0; }
>
>    /* Initialize and finalize loop structures, copying headers inbetween.  */
>    virtual unsigned int execute (function *);
>
> On powerpc64le with gcc master:
>
> $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
> /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && 
> cat stack_usage.su
> ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        496     static
>
> $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
> /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage 
> -fconserve-stack && cat stack_usage.su
> ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        176     static
>
> The reference for minimal stack usage is with -Os:
>
> $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
> /home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage  
> && cat stack_usage.su
> ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        32      static
>
> I will try to check if also enable the same test for -fgcse and -free-ter
> do make sense.
>
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__godbolt.org_z_WKa-2DBd&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=FWfDuHQlXPrv4N6aGpxHBIR_9-0axgnkvWu5FKlMExU&e=
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.linaro.org_mailman_listinfo_linaro-2Dtoolchain&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=OPNR-wbJdd-RI2tsN_VilGRnASXtEiwkDPbZF_XPYe8&e=
>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.linaro.org_mailman_listinfo_linaro-2Dtoolchain&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=Mov6trchdjyG9kJDO1-R7b1atsqyjCNMqioSggLR1e0&s=0LqnwDPlKqqQq-q3Ig5G95LN0In6T3Pxix9J_HAAj5k&e=
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to