Hi,
On 13 Dec 2024, at 13:50, Simon Martin wrote:
> Hi Marek,
>
> On 13 Dec 2024, at 0:44, Marek Polacek wrote:
>
>> On Thu, Dec 12, 2024 at 07:07:38PM +0000, Simon Martin wrote:
>>> We currently ICE upon the following valid (under -Wno-vla) code
>>>
>>> === cut here ===
>>> void f(int c) {
>>> constexpr int r = 4;
>>> [&](auto) { int t[r * c]; }(0);
>>> }
>>> === cut here ===
>>>
>>> The problem is that when parsing the lambda body, and more
>>> specifically
>>> the multiplication, we mark the lambda as
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>>> even though the replacement of r by 4 is "undone" by the call to
>>> build_min_non_dep in build_x_binary_op. This makes
>>> prune_lambda_captures
>>
>> Ah yeah, because build_min_non_dep gets the original operands.
>>
>>> remove the proxy declaration while it should not, and we trip on an
>>> assert at instantiation time.
>>>
>>> This patch fixes the ICE by making sure that lambdas are only marked
>>> as
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried
>>> other
>>> strategies like not undoing constant folding in build_min_non_dep,
>
>>> but
>>> it is pretty intrusive and breaks lots of things).
>>
>> I've tried that too and I also ran into a number of issues. I also
>
>> tried
>> checking p_t_d in prune_lambda_capture since it already says "Don't
>
>> bother
>> pruning in a template" but that doesn't work, either.
> Yeah, the fundamental “problem” is that for lambdas that are not
> within a template, we generate the closure before instantiating the
> lambda function, so prune_lambda_capture thinks (without my patch) that
> captures to constants have been folded out, which might be the case
> (e.g. with ok_3 in the test I added) or not (e.g. in this PR) depending
> on whether they’re part of an expression for which build_min_non_dep
> has been called.
>
>>> The test I added also shows that we don't always optimize out
>>> captures
>>> to constants for lambdas that are not within a template (see ok_2 for
>
>>> example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) -
>>> I'm
>>> curious if we consider it a problem or not? If so, I can try to fix
>>> this
>>> in a follow-up patch.
>>
>> Since "P0588R1: Simplifying implicit lambda capture" they are
>> captures,
>> though [expr.prim.lambda.capture] gives us license to optimize the
>> captures if not ODR-used. I couldn't find a test where this patch
>> would be an ABI change.
> Thanks for the pointer! Yeah, whether we optimise out the captures or
>
> not remains legit.
>
> And if you look specifically at ok_3 from the test case (without the
> patch, the capture is optimised out, with it it’s not anymore):
> - I don’t think it’s an ABI change since we don’t do anything
> with captures in mangle.cc.
> - Since captures are only relevant within the function calling them,
> we don’t have any issue if we link together object files generated
> without the patch and object files generated with the patch.
>
>>> Successfully tested on x86_64-pc-linux-gnu.
>>>
>>> PR c++/114292
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> * cp-tree.h (mark_const_var_capture_optimized): Declare.
>>> * expr.cc (mark_use): Call mark_const_var_capture_optimized.
>>> * lambda.cc (mark_const_var_capture_optimized): New. Only set
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.dg/cpp1y/lambda-ice4.C: New test.
>>>
>>> ---
>>> gcc/cp/cp-tree.h | 1 +
>>> gcc/cp/expr.cc | 10 ++----
>>> gcc/cp/lambda.cc | 13 +++++++
>>> gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44
>>> ++++++++++++++++++++++++
>>> 4 files changed, 60 insertions(+), 8 deletions(-)
>>> create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
>>>
>>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>>> index c5e0fc5c440..ce050032fdb 100644
>>> --- a/gcc/cp/cp-tree.h
>>> +++ b/gcc/cp/cp-tree.h
>>> @@ -8058,6 +8058,7 @@ extern bool is_constant_capture_proxy
>>> (tree);
>>> extern void register_capture_members (tree);
>>> extern tree lambda_expr_this_capture (tree, int);
>>> extern void maybe_generic_this_capture (tree, tree);
>>> +extern void mark_const_var_capture_optimized (void);
>>> extern tree maybe_resolve_dummy (tree, bool);
>>> extern tree current_nonlambda_function (void);
>>> extern tree nonlambda_method_basetype (void);
>>> diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
>>> index de4991e616c..d6a2454c46e 100644
>>> --- a/gcc/cp/expr.cc
>>> +++ b/gcc/cp/expr.cc
>>> @@ -120,10 +120,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>
>>> {
>>> tree val = RECUR (cap);
>>> if (!is_capture_proxy (val))
>>> - {
>>> - tree l = current_lambda_expr ();
>>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> - }
>>> + mark_const_var_capture_optimized ();
>>> return val;
>>> }
>>> }
>>> @@ -171,10 +168,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>
>>> {
>>> tree val = RECUR (cap);
>>> if (!is_capture_proxy (val))
>>> - {
>>> - tree l = current_lambda_expr ();
>>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> - }
>>> + mark_const_var_capture_optimized ();
>>> return val;
>>> }
>>> }
>>> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
>>> index d8a15d97d5d..4fd3b39c99b 100644
>>> --- a/gcc/cp/lambda.cc
>>> +++ b/gcc/cp/lambda.cc
>>> @@ -945,6 +945,19 @@ resolvable_dummy_lambda (tree object)
>>> return NULL_TREE;
>>> }
>>>
>>> +/* Called when optimizing out a capture to a const variable. */
>>> +
>>> +void
>>> +mark_const_var_capture_optimized ()
>>> +{
>>> + /* The actual optimizing out only occurs when instantiating the
>
>>> lambda. */
>>> + if (processing_template_decl)
>>> + return;
>>> +
>>> + tree l = current_lambda_expr ();
>>> + LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> +}
>>
>> If we do this, perhaps the function can be a static function in
>> expr.cc,
>> and maybe we can check processing_template_decl before
>> is_capture_proxy.
> Yeah, good suggestions, thanks!
>
> That’s what the attached updated patch does - also successfully tested
> on x86_64-pc-linux-gnu. Ok for trunk?
Ping, thanks!
Simon