Actually note it is not -ftree-ch which is causing the problem but rather 
-ftree-ch allows for other optimizations due their work.  E.g.  I need to turn 
off all of loop invariant code motion to get rid of the spilling: 
"-fno-tree-loop-im -fno-tree-pre -fno-move-loop-invariants -fno-gcse".

Also the reason why the memory barrier of the inline-asm works is because it 
tells the invariant code motion optimizations, there is some memory barrier 
that can have an effect on the code.
Basically GCC does not have a good way to estimate register pressure for loop 
invariant code motion.  It has some heurstics but those are not always good.

Thanks,
Andrew Pinski

________________________________________
From: linaro-toolchain <linaro-toolchain-boun...@lists.linaro.org> on behalf of 
Adhemerval Zanella <adhemerval.zane...@linaro.org>
Sent: Friday, November 22, 2019 5:40 AM
To: Arnd Bergmann
Cc: Linaro Toolchain Mailman List
Subject: [EXT] High stack usage due ftree-ch

External Email

----------------------------------------------------------------------
Hi Arnd,

I took a look on the stack usage issue in the kernel snippet you provided [1],
and as you have noted the most impact indeed come from -ftree-ch optimization.
It is enabled in all optimization levels besides -Os (since besides possible
increasing the stack usage it also might increase code side).

I am still fulling grasping what free-ch optimization does, but my understanding
so far is it tries to reorganize the loop for later loop optimization phases.
More specifically, what it ends up doing on the specific snippet is create extra
stack variables for the internal membber access in the inner loop (which in its
turns increase stack usage).

This is also why adding the compiler barrier inhibits the optimization, since it
prevents the ftree-ch to optimize the internal loop reorganization and it is
passed as is to later optimizations phases.

It is also a generic pass that affects all architecture, albeit the resulting
stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
usage using -fstack-usage along with -O2:

arm                     632
aarch64                 448
powerpc                 912
powerpc64le             560
s390                    600
s390x                   632
i386                    1376
x86_64                  784

Also, -fconserve-stack does not really help with this pass since ftree-ch does
not check the flag usage.  The fconserve-stack currently only seems to effect
the inliner by setting both large-stack-frame and large-stack-frame-growth to
some conservative values.

The straightforward change I am checking is just to disable tree-ch optimization
if fconserve-stack is also enabled:

diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index b894a7e0918..b14dd66257c 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -291,7 +291,8 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_tree_ch != 0; }
+  virtual bool gate (function *) { return flag_tree_ch != 0
+                                         && flag_conserve_stack == 0; }

   /* Initialize and finalize loop structures, copying headers inbetween.  */
   virtual unsigned int execute (function *);

On powerpc64le with gcc master:

$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
/home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && 
cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        496     static

$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
/home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage 
-fconserve-stack && cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        176     static

The reference for minimal stack usage is with -Os:

$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B 
/home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage  && 
cat stack_usage.su
../stack_usage.c:157:6:mlx5e_grp_sw_update_stats        32      static

I will try to check if also enable the same test for -fgcse and -free-ter
do make sense.

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__godbolt.org_z_WKa-2DBd&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=FWfDuHQlXPrv4N6aGpxHBIR_9-0axgnkvWu5FKlMExU&e=
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.linaro.org_mailman_listinfo_linaro-2Dtoolchain&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=OPNR-wbJdd-RI2tsN_VilGRnASXtEiwkDPbZF_XPYe8&e=
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to