Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Richard Biener Thu, 26 Sep 2024 23:16:55 -0700

On Thu, 26 Sep 2024, Marek Polacek wrote:

> On Wed, Sep 25, 2024 at 08:45:50PM -0400, Jason Merrill wrote:
> > On 9/25/24 4:21 PM, Marek Polacek wrote:
> > > On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> > > > On 9/24/24 5:10 PM, Marek Polacek wrote:
> > > > > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > > > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > This PR reports a missed optimization.  When we have:
> > > > > > > 
> > > > > > >      Str str{"Test"};
> > > > > > >      callback(str);
> > > > > > > 
> > > > > > > as in the test, we're able to evaluate the Str::Str() call at 
> > > > > > > compile
> > > > > > > time.  But when we have:
> > > > > > > 
> > > > > > >      callback(Str{"Test"});
> > > > > > > 
> > > > > > > we are not.  With this patch (in fact, it's Patrick's patch with 
> > > > > > > a little
> > > > > > > tweak), we turn
> > > > > > > 
> > > > > > >      callback (TARGET_EXPR <D.2890, <<< Unknown tree: 
> > > > > > > aggr_init_expr
> > > > > > >        5
> > > > > > >        __ct_comp
> > > > > > >        D.2890
> > > > > > >        (struct Str *) <<< Unknown tree: void_cst >>>
> > > > > > >        (const char *) "Test" >>>>)
> > > > > > > 
> > > > > > > into
> > > > > > > 
> > > > > > >      callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", 
> > > > > > > .length=4}>)
> > > > > > > 
> > > > > > > I explored the idea of calling maybe_constant_value for the whole
> > > > > > > TARGET_EXPR in cp_fold.  That has three problems:
> > > > > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure 
> > > > > > > the
> > > > > > >      result is also a TARGET_EXPR;
> > > > > > 
> > > > > > I'd think that the result should always be a TARGET_EXPR for a 
> > > > > > class, and
> > > > > > that's the case we want to fold; a TARGET_EXPR for a scalar is 
> > > > > > always the
> > > > > > initialize-temp-and-use-it pattern you mention below.
> > > > > 
> > > > > Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
> > > > > > > - the resulting TARGET_EXPR must have the same flags, otherwise 
> > > > > > > Bad
> > > > > > >      Things happen;
> > > > > > 
> > > > > > I guess maybe_constant_value should be fixed to preserve flags 
> > > > > > regardless of
> > > > > > this change.
> > > > > 
> > > > > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR 
> > > > > flags,
> > > > > but here we go into the break_out_target_exprs block in 
> > > > > maybe_constant_value
> > > > > and that doesn't necessarily preserve them.
> > > > > 
> > > > > > > - getting a new slot is also problematic.  I've seen a test where 
> > > > > > > we
> > > > > > >      had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the 
> > > > > > > whole TARGET_EXPR
> > > > > > >      would get us "TARGET_EXPR<D.2681, ...>", but since we don't 
> > > > > > > see the outer
> > > > > > >      D.2680, we can't replace it with D.2681, and things break.
> > > > > > 
> > > > > > Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> > > > > 
> > > > > ...unfortunately that doesn't always help.  I've reduced an example 
> > > > > into:
> > > > > 
> > > > >     struct optional {
> > > > >       constexpr optional(int) {}
> > > > >     };
> > > > >     optional foo() { return 2; }
> > > > > 
> > > > > where check_return_expr creates a COMPOUND_EXPR:
> > > > > 
> > > > >           retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> > > > >                            TREE_OPERAND (retval, 0));
> > > > > 
> > > > > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > > > > With this patch, two tree-ssa tests regressed: pr78687.C and 
> > > > > > > pr90883.C.
> > > > > > > 
> > > > > > > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted 
> > > > > > > redundant store: .*.a = {}"
> > > > > > > is easy.  Previously, we would call C::C, so .gimple has:
> > > > > > > 
> > > > > > >      D.2590 = {};
> > > > > > >      C::C (&D.2590);
> > > > > > >      D.2597 = D.2590;
> > > > > > >      return D.2597;
> > > > > > > 
> > > > > > > Then .einline inlines the C::C call:
> > > > > > > 
> > > > > > >      D.2590 = {};
> > > > > > >      D.2590.a = {}; // #1
> > > > > > >      D.2590.b = 0;  // #2
> > > > > > >      D.2597 = D.2590;
> > > > > > >      D.2590 ={v} {CLOBBER(eos)};
> > > > > > >      return D.2597;
> > > > > > > 
> > > > > > > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the 
> > > > > > > test
> > > > > > > passes.  But with the patch, .gimple won't have that C::C call, 
> > > > > > > so the
> > > > > > > IL is of course going to look different.
> > > > > > 
> > > > > > Maybe -fno-inline instead of the --param?
> > > > > 
> > > > > Then that C::C call isn't inlined and the test fails :/.
> > > > > > > Unfortunately, pr78687.C is much more complicated and I can't 
> > > > > > > explain
> > > > > > > precisely what happens there.  But it seems like a good idea to 
> > > > > > > have
> > > > > > > a way to avoid this optimization.  So I've added the "noinline" 
> > > > > > > check.
> > > > > > 
> > > > > > Hmm, I'm surprised make_object_1 would be affected, since the 
> > > > > > ref_proxy
> > > > > > constructors are not constexpr.  And I wouldn't expect the 
> > > > > > optimization to
> > > > > > affect the value-initialization option_2().
> > > > > 
> > > > > In pr78687.C we do this new optimization only once for
> > > > > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept 
> > > > > (std::is_nothrow_constructible<T, U&&>::value)".
> > > > 
> > > > Aha.  Can we remove its constexpr?
> > > > ...no, it's defaulted.  And moving the defaulting after the class also
> > > > breaks the testcase.
> 
> FWIW, removing EGGS_CXX11_CONSTEXPR on line 360 "fixes" the test.
> 
> > > > > > >   PR c++/116416
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >   * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > > > > >   TARGET_EXPR_INITIAL and replace it with the folded result if
> > > > > > >   it's TREE_CONSTANT.
> > > > > > > 
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > >   * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > > > > >   * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > > > > >   * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > > > > >   * g++.dg/tree-ssa/pr90883.C: Likewise.
> > > > > > >   * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > > > > 
> > > > > > > Co-authored-by: Patrick Palka <[email protected]>
> > > > > > > ---
> > > > > > >     gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> > > > > > >     gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> > > > > > >     .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 
> > > > > > > +++++++++++++++++++
> > > > > > >     gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> > > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> > > > > > >     gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> > > > > > >     6 files changed, 50 insertions(+), 3 deletions(-)
> > > > > > >     create mode 100644 
> > > > > > > gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > index 003e68f1ea7..41d6333f650 100644
> > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int 
> > > > > > > *walk_subtrees, void *data_)
> > > > > > >            that case, strip it in favor of this one.  */
> > > > > > >           if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > > > > >           {
> > > > > > > +   tree fn;
> > > > > > > +   if ((data->flags & ff_genericize)
> > > > > > > +       /* Give the user an option to opt out.  */
> > > > > > > +       && !((fn = current_function_decl)
> > > > > > > +            && lookup_attribute ("noinline",
> > > > > > > +                                 DECL_ATTRIBUTES (fn))))
> > > > > > 
> > > > > > This looks backwards from the usual sense of noinline, which 
> > > > > > suppresses
> > > > > > inlining of the function so marked, rather than inlining of other 
> > > > > > functions
> > > > > > called.  If you want to check noinline, you need to dig out the 
> > > > > > called
> > > > > > function (if any), perhaps with extract_call_expr.
> > > > > 
> > > > > Now I wish I hadn't done that.  But for implicit functions, like in
> > > > > pr90883.C where we have
> > > > > 
> > > > >       class C
> > > > >       {
> > > > >           char a[7]{};
> > > > >           int b{};
> > > > >       };
> > > > > there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> > > > > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > > > > this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> > > > > flag for this :(.
> > > > > 
> > > > > Or should we just adjust the tests?
> > > > 
> > > > I think so, yes.  Probably so that they test that the output doesn't 
> > > > contain
> > > > the redundancy that the PR complains about; reaching optimal output by
> > > > another path isn't a problem.
> > > > 
> > > > For pr78687, I guess that means checking the number of aggregate
> > > > temporaries.
> > > > 
> > > > For pr90883, maybe check for the lack of ".a =" and ".b = "?
> > > 
> > > Sure.
> > > > What is the .optimized for these two tests after your patch?
> > > 
> > > pr90883 is fine, the dumps are the same.  In pr78687 they very much
> > > aren't.
> > > 
> > > Let's see what's happening here.  In .gimple, the difference is
> > > in make_object_1:
> > > 
> > >   struct ref_proxy make_object_1 ()
> > >   {
> > >     struct variant D.9472;
> > > -  struct option_2 D.9273;
> > > 
> > >     try
> > >       {
> > > -      try
> > > -        {
> > > -          eggs::variants::variant<option_1, option_2>::variant<option_2> 
> > > (&D.9472, &D.9273);
> > > -          ref_proxy<option_2, variant_ref<option_1, option_2> 
> > > >::ref_proxy (&<retval>, D.9472);
> > > -          return <retval>;
> > > -        }
> > > -      finally
> > > -        {
> > > -          D.9273 = {CLOBBER(eos)};
> > > -        }
> > > +      D.9472 = {};
> > > +      D.9472._storage._which = 2;
> > > +      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy 
> > > (&<retval>, D.9472);
> > > +      return <retval>;
> > > 
> > > that makes sense: the new optimization "inlined" the
> > > eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So 
> > > later
> > > .einline has nothing to expand here whereas previously
> > > 
> > >    eggs::variants::variant<option_1, option_2>::variant<option_2> 
> > > (&D.9472, &D.9273);
> > > 
> > > was expanded into
> > > 
> > >    MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > >    MEM[(struct _storage *)&D.9472]._which = 2;
> > > 
> > > Then make_object_1 gets inlined.  Something happens in SRA.  And
> > > then we got rid of a lot of code.  But now a lot of code remains.
> > > 
> > > Is it simply the fact that with this opt we expand the ctor into
> > > 
> > >    D.9472 = {};
> > >    D.9472._storage._which = 2;
> > > 
> > > which means that we zero-init the variant object and then set _which to 2,
> > > while previously we just allocated storage and set _which to 2?
> > 
> > That seems likely since the patch that fixed the bug before was dealing with
> > partially-initialized objects.  Why does the optimization change that?
> 
> CCing a few optimizer folks.  To recap, we're talking about 
> tree-ssa/pr78687.C.
> In in, there's:
> 
>         EGGS_CXX11_CONSTEXPR variant(U&& v)
>             noexcept(
>                 std::is_nothrow_constructible<T, U&&>::value)
>           : _storage{detail::index<I + 1>{}, std::forward<U>(v)}
>         {}
> 
> With a new C++ FE optimization this patch introduces, we can evaluate the call
> at compile time.  Then .gimple looks a little different (see above).  What is
> not clear is why we can't optimize the code as much as without this patch,
> when the variant call isn't evaluated at compile time, and instead we produce
> those MEMs as shown above.
> 
> Any insights would be appreciated.
>  
> This is .optimized with the opt on:
> 
> __attribute__((noinline))
> struct ref_proxy f ()
> {
>    struct ref_proxy ptr;
>    struct ref_proxy D.10036;
>    struct ref_proxy type;
>    struct ref_proxy type;
>    struct qual_option D.10031;
>    struct ref_proxy D.10030;
>    struct qual_option inner;
>    struct variant t;
>    struct variant D.10026;
>    struct variant D.10024;
>    struct inplace_ref D.10023;
>    struct inplace_ref ptr;
>    struct ref_proxy D.9898;
> 
>    <bb 2> [local count: 1073741824]:
>    MEM <char[40]> [(struct variant *)&D.10024] = {};


Without actually checking it might be that SRA chokes on the above.
The IL is basically a huge chain of aggregate copies interspersed
with clobbers and occasional scalar inits and I fear that we really
only have SRA dealing with this.

Is there any reason to use the char[40] init instead of a aggregate
{} init of type variant?

I would suggest to open a bugreport.

>    D.10024._storage._which = 2;
>    D.10026 = D.10024;
>    t = D.10026;
>    MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
>    t ={v} {CLOBBER(eos)};
>    D.10026 ={v} {CLOBBER(eos)};
>    D.10024 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
>    D.10036 = D.9898;
>    ptr = D.10036;
>    MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
>    MEM[(struct variant_ref *)&D.10030].inner_storage_ = 
> ptr.D.9270.inner_storage_;
>    ptr ={v} {CLOBBER(eos)};
>    D.10036 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
>    type = D.10030;
>    type = type;
>    MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
>    D.10031.type_ = type;
>    type ={v} {CLOBBER(eos)};
>    type ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
>    D.10031.quals_ = 0;
>    inner = D.10031;
>    D.10023 ={v} {CLOBBER(bob)};
>    D.10023.inner_ = inner;
>    inner ={v} {CLOBBER(eos)};
>    D.10030 ={v} {CLOBBER(eos)};
>    D.10031 ={v} {CLOBBER(eos)};
>    MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
>    MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
>    ptr = D.10023;
>    <retval> ={v} {CLOBBER(bob)};
>    <retval>.D.9858 = ptr;
>    ptr ={v} {CLOBBER(eos)};
>    D.9898 ={v} {CLOBBER(eos)};
>    return <retval>;
> 
> }
> 
> and this is the result without this patch:
> 
> __attribute__((noinline))
> struct ref_proxy f ()
> {
>   <bb 2> [local count: 1073741824]:
>   <retval> ={v} {CLOBBER(bob)};
>   MEM <size_t> [(struct inplace_ref *)&<retval> + 40B] = 2;
>   MEM <int> [(struct inplace_ref *)&<retval> + 48B] = 0;
>   return <retval>;
> 
> }
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Reply via email to