On Sun, Jun 18, 2023 at 5:55 PM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> we currently produce very bad code on loops using std::vector as a stack, 
> since
> we fail to inline push_back which in turn prevents SRA and we fail to optimize
> out some store-to-load pairs (PR109849).
>
> I looked into why this function is not inlined and it is inlined by clang.  We
> currently estimate it to 66 instructions and inline limits are 15 at -O2 and 
> 30
> at -O3.  Clang has similar estimate, but still decides to inline at -O2.
>
> I looked into reason why the body is so large and one problem I spotted is the
> way std::max is implemented by taking and returning reference to the values.
>
>   const T& max( const T& a, const T& b );
>
> This makes it necessary to store the values to memory and load them later
> (Max is used by code computing new size of vector on resize.)
> Two stores, conditional and load accounts as 8 instructions, while
> MAX_EXPR as 1 and has a lot better chance to fold with the surrounding
> code.
>
> We optimize this to MAX_EXPR, but only during late optimizations.  I think 
> this
> is a common enough coding pattern and we ought to make this transparent to
> early opts and IPA.  The following is easist fix that simply adds phiprop pass
> that turns the PHI of address values into PHI of values so later FRE can
> propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE
> remove the memory stores.
>
> Bootstrapped/regtested x86_64-linux, does this look resonable thing to do?

OK.

Thanks,
Richard.

> Looking into how expensive the pass is, I think it is very cheap, except
> that it computes postdominator and updates ssa even if no patterns
> are matched.  I will send patch to avoid that.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/109811
>         PR tree-optimization/109849
>         * passes.def: Add phiprop to early optimization passes.
>         * tree-ssa-phiprop.cc: Allow clonning.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/109811
>         PR tree-optimization/109849
>         * gcc.dg/tree-ssa/phiprop-1.c: New test.
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c9a8f19747b..faa5208b26b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -88,6 +88,8 @@ along with GCC; see the file COPYING3.  If not see
>           /* pass_build_ealias is a dummy pass that ensures that we
>              execute TODO_rebuild_alias at this point.  */
>           NEXT_PASS (pass_build_ealias);
> +         /* Do phiprop before FRE so we optimize std::min and std::max well. 
>  */
> +         NEXT_PASS (pass_phiprop);
>           NEXT_PASS (pass_fre, true /* may_iterate */);
>           NEXT_PASS (pass_early_vrp);
>           NEXT_PASS (pass_merge_phi);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
> new file mode 100644
> index 00000000000..9f52c2a7298
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } 
> */
> +int max(int a, int b)
> +{
> +        int *ptr;
> +        if (a > b)
> +          ptr = &a;
> +        else
> +          ptr = &b;
> +        return *ptr;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
> "phiprop1"} } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */
> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> index 3cb4900b6be..5dc505df420 100644
> --- a/gcc/tree-ssa-phiprop.cc
> +++ b/gcc/tree-ssa-phiprop.cc
> @@ -476,6 +476,7 @@ public:
>    {}
>
>    /* opt_pass methods: */
> +  opt_pass * clone () final override { return new pass_phiprop (m_ctxt); }
>    bool gate (function *) final override { return flag_tree_phiprop; }
>    unsigned int execute (function *) final override;
>

Reply via email to