On Thu, 26 Sep 2024, Marek Polacek wrote:
> On Wed, Sep 25, 2024 at 08:45:50PM -0400, Jason Merrill wrote:
> > On 9/25/24 4:21 PM, Marek Polacek wrote:
> > > On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> > > > On 9/24/24 5:10 PM, Marek Polacek wrote:
> > > > > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > > > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > >
> > > > > > > -- >8 --
> > > > > > > This PR reports a missed optimization. When we have:
> > > > > > >
> > > > > > > Str str{"Test"};
> > > > > > > callback(str);
> > > > > > >
> > > > > > > as in the test, we're able to evaluate the Str::Str() call at
> > > > > > > compile
> > > > > > > time. But when we have:
> > > > > > >
> > > > > > > callback(Str{"Test"});
> > > > > > >
> > > > > > > we are not. With this patch (in fact, it's Patrick's patch with
> > > > > > > a little
> > > > > > > tweak), we turn
> > > > > > >
> > > > > > > callback (TARGET_EXPR <D.2890, <<< Unknown tree:
> > > > > > > aggr_init_expr
> > > > > > > 5
> > > > > > > __ct_comp
> > > > > > > D.2890
> > > > > > > (struct Str *) <<< Unknown tree: void_cst >>>
> > > > > > > (const char *) "Test" >>>>)
> > > > > > >
> > > > > > > into
> > > > > > >
> > > > > > > callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test",
> > > > > > > .length=4}>)
> > > > > > >
> > > > > > > I explored the idea of calling maybe_constant_value for the whole
> > > > > > > TARGET_EXPR in cp_fold. That has three problems:
> > > > > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure
> > > > > > > the
> > > > > > > result is also a TARGET_EXPR;
> > > > > >
> > > > > > I'd think that the result should always be a TARGET_EXPR for a
> > > > > > class, and
> > > > > > that's the case we want to fold; a TARGET_EXPR for a scalar is
> > > > > > always the
> > > > > > initialize-temp-and-use-it pattern you mention below.
> > > > >
> > > > > Checking CLASS_TYPE_P would solve some of the problems, yes. But...
> > > > > > > - the resulting TARGET_EXPR must have the same flags, otherwise
> > > > > > > Bad
> > > > > > > Things happen;
> > > > > >
> > > > > > I guess maybe_constant_value should be fixed to preserve flags
> > > > > > regardless of
> > > > > > this change.
> > > > >
> > > > > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR
> > > > > flags,
> > > > > but here we go into the break_out_target_exprs block in
> > > > > maybe_constant_value
> > > > > and that doesn't necessarily preserve them.
> > > > >
> > > > > > > - getting a new slot is also problematic. I've seen a test where
> > > > > > > we
> > > > > > > had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the
> > > > > > > whole TARGET_EXPR
> > > > > > > would get us "TARGET_EXPR<D.2681, ...>", but since we don't
> > > > > > > see the outer
> > > > > > > D.2680, we can't replace it with D.2681, and things break.
> > > > > >
> > > > > > Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P?
> > > > >
> > > > > ...unfortunately that doesn't always help. I've reduced an example
> > > > > into:
> > > > >
> > > > > struct optional {
> > > > > constexpr optional(int) {}
> > > > > };
> > > > > optional foo() { return 2; }
> > > > >
> > > > > where check_return_expr creates a COMPOUND_EXPR:
> > > > >
> > > > > retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> > > > > TREE_OPERAND (retval, 0));
> > > > >
> > > > > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > > > > With this patch, two tree-ssa tests regressed: pr78687.C and
> > > > > > > pr90883.C.
> > > > > > >
> > > > > > > FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted
> > > > > > > redundant store: .*.a = {}"
> > > > > > > is easy. Previously, we would call C::C, so .gimple has:
> > > > > > >
> > > > > > > D.2590 = {};
> > > > > > > C::C (&D.2590);
> > > > > > > D.2597 = D.2590;
> > > > > > > return D.2597;
> > > > > > >
> > > > > > > Then .einline inlines the C::C call:
> > > > > > >
> > > > > > > D.2590 = {};
> > > > > > > D.2590.a = {}; // #1
> > > > > > > D.2590.b = 0; // #2
> > > > > > > D.2597 = D.2590;
> > > > > > > D.2590 ={v} {CLOBBER(eos)};
> > > > > > > return D.2597;
> > > > > > >
> > > > > > > then #2 is removed in .fre1, and #1 is removed in .dse1. So the
> > > > > > > test
> > > > > > > passes. But with the patch, .gimple won't have that C::C call,
> > > > > > > so the
> > > > > > > IL is of course going to look different.
> > > > > >
> > > > > > Maybe -fno-inline instead of the --param?
> > > > >
> > > > > Then that C::C call isn't inlined and the test fails :/.
> > > > > > > Unfortunately, pr78687.C is much more complicated and I can't
> > > > > > > explain
> > > > > > > precisely what happens there. But it seems like a good idea to
> > > > > > > have
> > > > > > > a way to avoid this optimization. So I've added the "noinline"
> > > > > > > check.
> > > > > >
> > > > > > Hmm, I'm surprised make_object_1 would be affected, since the
> > > > > > ref_proxy
> > > > > > constructors are not constexpr. And I wouldn't expect the
> > > > > > optimization to
> > > > > > affect the value-initialization option_2().
> > > > >
> > > > > In pr78687.C we do this new optimization only once for
> > > > > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept
> > > > > (std::is_nothrow_constructible<T, U&&>::value)".
> > > >
> > > > Aha. Can we remove its constexpr?
> > > > ...no, it's defaulted. And moving the defaulting after the class also
> > > > breaks the testcase.
>
> FWIW, removing EGGS_CXX11_CONSTEXPR on line 360 "fixes" the test.
>
> > > > > > > PR c++/116416
> > > > > > >
> > > > > > > gcc/cp/ChangeLog:
> > > > > > >
> > > > > > > * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > > > > > TARGET_EXPR_INITIAL and replace it with the folded result if
> > > > > > > it's TREE_CONSTANT.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > > > > > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > > > > > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > > > > > * g++.dg/tree-ssa/pr90883.C: Likewise.
> > > > > > > * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > > > >
> > > > > > > Co-authored-by: Patrick Palka <[email protected]>
> > > > > > > ---
> > > > > > > gcc/cp/cp-gimplify.cc | 14 +++++++++
> > > > > > > gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +-
> > > > > > > .../g++.dg/cpp1y/constexpr-prvalue1.C | 29
> > > > > > > +++++++++++++++++++
> > > > > > > gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +-
> > > > > > > gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++-
> > > > > > > gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 +
> > > > > > > 6 files changed, 50 insertions(+), 3 deletions(-)
> > > > > > > create mode 100644
> > > > > > > gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > > > >
> > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > index 003e68f1ea7..41d6333f650 100644
> > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int
> > > > > > > *walk_subtrees, void *data_)
> > > > > > > that case, strip it in favor of this one. */
> > > > > > > if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > > > > > {
> > > > > > > + tree fn;
> > > > > > > + if ((data->flags & ff_genericize)
> > > > > > > + /* Give the user an option to opt out. */
> > > > > > > + && !((fn = current_function_decl)
> > > > > > > + && lookup_attribute ("noinline",
> > > > > > > + DECL_ATTRIBUTES (fn))))
> > > > > >
> > > > > > This looks backwards from the usual sense of noinline, which
> > > > > > suppresses
> > > > > > inlining of the function so marked, rather than inlining of other
> > > > > > functions
> > > > > > called. If you want to check noinline, you need to dig out the
> > > > > > called
> > > > > > function (if any), perhaps with extract_call_expr.
> > > > >
> > > > > Now I wish I hadn't done that. But for implicit functions, like in
> > > > > pr90883.C where we have
> > > > >
> > > > > class C
> > > > > {
> > > > > char a[7]{};
> > > > > int b{};
> > > > > };
> > > > > there's nothing to make noinline anyway. So I'm stuck. I suppose
> > > > > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > > > > this optimization. Can I abuse -fearly-inlining? I don't want a new
> > > > > flag for this :(.
> > > > >
> > > > > Or should we just adjust the tests?
> > > >
> > > > I think so, yes. Probably so that they test that the output doesn't
> > > > contain
> > > > the redundancy that the PR complains about; reaching optimal output by
> > > > another path isn't a problem.
> > > >
> > > > For pr78687, I guess that means checking the number of aggregate
> > > > temporaries.
> > > >
> > > > For pr90883, maybe check for the lack of ".a =" and ".b = "?
> > >
> > > Sure.
> > > > What is the .optimized for these two tests after your patch?
> > >
> > > pr90883 is fine, the dumps are the same. In pr78687 they very much
> > > aren't.
> > >
> > > Let's see what's happening here. In .gimple, the difference is
> > > in make_object_1:
> > >
> > > struct ref_proxy make_object_1 ()
> > > {
> > > struct variant D.9472;
> > > - struct option_2 D.9273;
> > >
> > > try
> > > {
> > > - try
> > > - {
> > > - eggs::variants::variant<option_1, option_2>::variant<option_2>
> > > (&D.9472, &D.9273);
> > > - ref_proxy<option_2, variant_ref<option_1, option_2>
> > > >::ref_proxy (&<retval>, D.9472);
> > > - return <retval>;
> > > - }
> > > - finally
> > > - {
> > > - D.9273 = {CLOBBER(eos)};
> > > - }
> > > + D.9472 = {};
> > > + D.9472._storage._which = 2;
> > > + ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy
> > > (&<retval>, D.9472);
> > > + return <retval>;
> > >
> > > that makes sense: the new optimization "inlined" the
> > > eggs::variants::variant<option_1, option_2>::variant<option_2> call. So
> > > later
> > > .einline has nothing to expand here whereas previously
> > >
> > > eggs::variants::variant<option_1, option_2>::variant<option_2>
> > > (&D.9472, &D.9273);
> > >
> > > was expanded into
> > >
> > > MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
> > > MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > > MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
> > > MEM[(struct _storage *)&D.9472]._which = 2;
> > >
> > > Then make_object_1 gets inlined. Something happens in SRA. And
> > > then we got rid of a lot of code. But now a lot of code remains.
> > >
> > > Is it simply the fact that with this opt we expand the ctor into
> > >
> > > D.9472 = {};
> > > D.9472._storage._which = 2;
> > >
> > > which means that we zero-init the variant object and then set _which to 2,
> > > while previously we just allocated storage and set _which to 2?
> >
> > That seems likely since the patch that fixed the bug before was dealing with
> > partially-initialized objects. Why does the optimization change that?
>
> CCing a few optimizer folks. To recap, we're talking about
> tree-ssa/pr78687.C.
> In in, there's:
>
> EGGS_CXX11_CONSTEXPR variant(U&& v)
> noexcept(
> std::is_nothrow_constructible<T, U&&>::value)
> : _storage{detail::index<I + 1>{}, std::forward<U>(v)}
> {}
>
> With a new C++ FE optimization this patch introduces, we can evaluate the call
> at compile time. Then .gimple looks a little different (see above). What is
> not clear is why we can't optimize the code as much as without this patch,
> when the variant call isn't evaluated at compile time, and instead we produce
> those MEMs as shown above.
>
> Any insights would be appreciated.
>
> This is .optimized with the opt on:
>
> __attribute__((noinline))
> struct ref_proxy f ()
> {
> struct ref_proxy ptr;
> struct ref_proxy D.10036;
> struct ref_proxy type;
> struct ref_proxy type;
> struct qual_option D.10031;
> struct ref_proxy D.10030;
> struct qual_option inner;
> struct variant t;
> struct variant D.10026;
> struct variant D.10024;
> struct inplace_ref D.10023;
> struct inplace_ref ptr;
> struct ref_proxy D.9898;
>
> <bb 2> [local count: 1073741824]:
> MEM <char[40]> [(struct variant *)&D.10024] = {};
Without actually checking it might be that SRA chokes on the above.
The IL is basically a huge chain of aggregate copies interspersed
with clobbers and occasional scalar inits and I fear that we really
only have SRA dealing with this.
Is there any reason to use the char[40] init instead of a aggregate
{} init of type variant?
I would suggest to open a bugreport.
> D.10024._storage._which = 2;
> D.10026 = D.10024;
> t = D.10026;
> MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
> MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
> t ={v} {CLOBBER(eos)};
> D.10026 ={v} {CLOBBER(eos)};
> D.10024 ={v} {CLOBBER(eos)};
> MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
> D.10036 = D.9898;
> ptr = D.10036;
> MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
> MEM[(struct variant_ref *)&D.10030].inner_storage_ =
> ptr.D.9270.inner_storage_;
> ptr ={v} {CLOBBER(eos)};
> D.10036 ={v} {CLOBBER(eos)};
> MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
> type = D.10030;
> type = type;
> MEM[(struct __as_base &)&D.10031] ={v} {CLOBBER(bob)};
> D.10031.type_ = type;
> type ={v} {CLOBBER(eos)};
> type ={v} {CLOBBER(eos)};
> MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
> D.10031.quals_ = 0;
> inner = D.10031;
> D.10023 ={v} {CLOBBER(bob)};
> D.10023.inner_ = inner;
> inner ={v} {CLOBBER(eos)};
> D.10030 ={v} {CLOBBER(eos)};
> D.10031 ={v} {CLOBBER(eos)};
> MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
> MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
> ptr = D.10023;
> <retval> ={v} {CLOBBER(bob)};
> <retval>.D.9858 = ptr;
> ptr ={v} {CLOBBER(eos)};
> D.9898 ={v} {CLOBBER(eos)};
> return <retval>;
>
> }
>
> and this is the result without this patch:
>
> __attribute__((noinline))
> struct ref_proxy f ()
> {
> <bb 2> [local count: 1073741824]:
> <retval> ={v} {CLOBBER(bob)};
> MEM <size_t> [(struct inplace_ref *)&<retval> + 40B] = 2;
> MEM <int> [(struct inplace_ref *)&<retval> + 48B] = 0;
> return <retval>;
>
> }
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)