On 22/11/2019 11:38, Andrew Pinski wrote: >> It is enabled in all optimization levels besides -Os (since besides possible >> increasing the stack usage it also might increase code side). > It is disabled at -Os because it is duplicating the loop header; which in > turn is considered increasing code size (though sometimes that can have a > side effect of decreasing the code size later on but that is a different > story). > The increase of stack usage is due to register pressure with respect to other > optimizations that can now work with the copied loop header. If anything, > the register pressure heuristics needs improvement for code motion passes or > the ability to undo those code motion while doing register allocation. THIS > IS a HUGE project and should not be taken lightly. It just happens this code > happens here and causes issues. It is not the normal case really. >
Thanks for the information, at least for the specific snippet it seems that both -fno-tree-loop-im and -fno-tree-pre are the ones generating most spilling. So the question I have it is worth to disable -free-ch when -fstack-conserve is set (since it the flag idea to prevent such pessimizations) or the idea is just to disable -ftree-ch for such cases. > Thanks, > Andrew > > ________________________________________ > From: linaro-toolchain <linaro-toolchain-boun...@lists.linaro.org> on behalf > of Adhemerval Zanella <adhemerval.zane...@linaro.org> > Sent: Friday, November 22, 2019 5:40 AM > To: Arnd Bergmann > Cc: Linaro Toolchain Mailman List > Subject: [EXT] High stack usage due ftree-ch > > External Email > > ---------------------------------------------------------------------- > Hi Arnd, > > I took a look on the stack usage issue in the kernel snippet you provided [1], > and as you have noted the most impact indeed come from -ftree-ch optimization. > It is enabled in all optimization levels besides -Os (since besides possible > increasing the stack usage it also might increase code side). > > I am still fulling grasping what free-ch optimization does, but my > understanding > so far is it tries to reorganize the loop for later loop optimization phases. > More specifically, what it ends up doing on the specific snippet is create > extra > stack variables for the internal membber access in the inner loop (which in > its > turns increase stack usage). > > This is also why adding the compiler barrier inhibits the optimization, since > it > prevents the ftree-ch to optimize the internal loop reorganization and it is > passed as is to later optimizations phases. > > It is also a generic pass that affects all architecture, albeit the resulting > stack will depend on later passes. With GCC 9.2.1 I see the resulting stack > usage using -fstack-usage along with -O2: > > arm 632 > aarch64 448 > powerpc 912 > powerpc64le 560 > s390 600 > s390x 632 > i386 1376 > x86_64 784 > > Also, -fconserve-stack does not really help with this pass since ftree-ch does > not check the flag usage. The fconserve-stack currently only seems to effect > the inliner by setting both large-stack-frame and large-stack-frame-growth to > some conservative values. > > The straightforward change I am checking is just to disable tree-ch > optimization > if fconserve-stack is also enabled: > > diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c > index b894a7e0918..b14dd66257c 100644 > --- a/gcc/tree-ssa-loop-ch.c > +++ b/gcc/tree-ssa-loop-ch.c > @@ -291,7 +291,8 @@ public: > {} > > /* opt_pass methods: */ > - virtual bool gate (function *) { return flag_tree_ch != 0; } > + virtual bool gate (function *) { return flag_tree_ch != 0 > + && flag_conserve_stack == 0; } > > /* Initialize and finalize loop structures, copying headers inbetween. */ > virtual unsigned int execute (function *); > > On powerpc64le with gcc master: > > $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B > /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && > cat stack_usage.su > ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 496 static > > $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B > /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage > -fconserve-stack && cat stack_usage.su > ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 176 static > > The reference for minimal stack usage is with -Os: > > $ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B > /home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage > && cat stack_usage.su > ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 32 static > > I will try to check if also enable the same test for -fgcse and -free-ter > do make sense. > > [1] > https://urldefense.proofpoint.com/v2/url?u=https-3A__godbolt.org_z_WKa-2DBd&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=FWfDuHQlXPrv4N6aGpxHBIR_9-0axgnkvWu5FKlMExU&e= > _______________________________________________ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.linaro.org_mailman_listinfo_linaro-2Dtoolchain&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=L_uAQMgirzaBwiEk05NHY-AMcNfJzugOS_xTjrtS94k&m=ySgQrryO8OlXh50QdjZ81DXxOL3LLUd7ecrtnTWd8zA&s=OPNR-wbJdd-RI2tsN_VilGRnASXtEiwkDPbZF_XPYe8&e= > _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain