On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> On 9/20/24 12:18 AM, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > -- >8 --
> > This PR reports a missed optimization. When we have:
> >
> > Str str{"Test"};
> > callback(str);
> >
> > as in the test, we're able to evaluate the Str::Str() call at compile
> > time. But when we have:
> >
> > callback(Str{"Test"});
> >
> > we are not. With this patch (in fact, it's Patrick's patch with a little
> > tweak), we turn
> >
> > callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> > 5
> > __ct_comp
> > D.2890
> > (struct Str *) <<< Unknown tree: void_cst >>>
> > (const char *) "Test" >>>>)
> >
> > into
> >
> > callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
> >
> > I explored the idea of calling maybe_constant_value for the whole
> > TARGET_EXPR in cp_fold. That has three problems:
> > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> > result is also a TARGET_EXPR;
>
> I'd think that the result should always be a TARGET_EXPR for a class, and
> that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> initialize-temp-and-use-it pattern you mention below.
Checking CLASS_TYPE_P would solve some of the problems, yes. But...
> > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> > Things happen;
>
> I guess maybe_constant_value should be fixed to preserve flags regardless of
> this change.
Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
but here we go into the break_out_target_exprs block in maybe_constant_value
and that doesn't necessarily preserve them.
> > - getting a new slot is also problematic. I've seen a test where we
> > had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
> > would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
> > D.2680, we can't replace it with D.2681, and things break.
>
> Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P?
...unfortunately that doesn't always help. I've reduced an example into:
struct optional {
constexpr optional(int) {}
};
optional foo() { return 2; }
where check_return_expr creates a COMPOUND_EXPR:
retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
TREE_OPERAND (retval, 0));
where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> >
> > FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant
> > store: .*.a = {}"
> > is easy. Previously, we would call C::C, so .gimple has:
> >
> > D.2590 = {};
> > C::C (&D.2590);
> > D.2597 = D.2590;
> > return D.2597;
> >
> > Then .einline inlines the C::C call:
> >
> > D.2590 = {};
> > D.2590.a = {}; // #1
> > D.2590.b = 0; // #2
> > D.2597 = D.2590;
> > D.2590 ={v} {CLOBBER(eos)};
> > return D.2597;
> >
> > then #2 is removed in .fre1, and #1 is removed in .dse1. So the test
> > passes. But with the patch, .gimple won't have that C::C call, so the
> > IL is of course going to look different.
>
> Maybe -fno-inline instead of the --param?
Then that C::C call isn't inlined and the test fails :/.
> > Unfortunately, pr78687.C is much more complicated and I can't explain
> > precisely what happens there. But it seems like a good idea to have
> > a way to avoid this optimization. So I've added the "noinline" check.
>
> Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> constructors are not constexpr. And I wouldn't expect the optimization to
> affect the value-initialization option_2().
In pr78687.C we do this new optimization only once for
"constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept
(std::is_nothrow_constructible<T, U&&>::value)".
> > PR c++/116416
> >
> > gcc/cp/ChangeLog:
> >
> > * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > TARGET_EXPR_INITIAL and replace it with the folded result if
> > it's TREE_CONSTANT.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > * g++.dg/tree-ssa/pr90883.C: Likewise.
> > * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> >
> > Co-authored-by: Patrick Palka <[email protected]>
> > ---
> > gcc/cp/cp-gimplify.cc | 14 +++++++++
> > gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +-
> > .../g++.dg/cpp1y/constexpr-prvalue1.C | 29 +++++++++++++++++++
> > gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +-
> > gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++-
> > gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 +
> > 6 files changed, 50 insertions(+), 3 deletions(-)
> > create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> >
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index 003e68f1ea7..41d6333f650 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void
> > *data_)
> > that case, strip it in favor of this one. */
> > if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > {
> > + tree fn;
> > + if ((data->flags & ff_genericize)
> > + /* Give the user an option to opt out. */
> > + && !((fn = current_function_decl)
> > + && lookup_attribute ("noinline",
> > + DECL_ATTRIBUTES (fn))))
>
> This looks backwards from the usual sense of noinline, which suppresses
> inlining of the function so marked, rather than inlining of other functions
> called. If you want to check noinline, you need to dig out the called
> function (if any), perhaps with extract_call_expr.
Now I wish I hadn't done that. But for implicit functions, like in
pr90883.C where we have
class C
{
char a[7]{};
int b{};
};
there's nothing to make noinline anyway. So I'm stuck. I suppose
it wouldn't make sense to use -fno-fold-simple-inlines to disable
this optimization. Can I abuse -fearly-inlining? I don't want a new
flag for this :(.
Or should we just adjust the tests?
> This should also share conditions with the code that calls
> maybe_constant_value for CALL_EXPR (flag_no_inline, in particular).
Done. So I have
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
that case, strip it in favor of this one. */
if (tree &init = TARGET_EXPR_INITIAL (stmt))
{
+ if ((data->flags & ff_genericize)
+ && !flag_no_inline)
+ {
+ tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+ if (folded != init && TREE_CONSTANT (folded))
+ {
+ init = folded;
+ break;
+ }
+ }
cp_walk_tree (&init, cp_fold_r, data, NULL);
cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
*walk_subtrees = 0;
But there are still these two fails as mentioned above:
FAIL: g++.dg/tree-ssa/pr78687.C scan-tree-dump sra "Removing load:.*ptr;"
FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store:
.*.a = {}"