On 7/9/25 7:34 AM, Jakub Jelinek wrote:
On Tue, Jul 08, 2025 at 09:43:20PM -0400, Jason Merrill wrote:
Thanks for the review.
Working on the rest (most of it already have in my working copy).

       case CLEANUP_POINT_EXPR:
         {
-       auto_vec<tree, 2> cleanups;
+       auto_vec<tree, 4> cleanups;

What's the rationale for this increase?  It seems costly to increase the
size of a local variable in cxx_eval_constant_expression given the deep
recursion.  I think we need a rationale for using any internal storage at
all in this auto_vec; all full-expressions are wrapped in
CLEANUP_POINT_EXPR, and only some of them actually have any cleanups.

Though I don't know for sure whether this actually affects the size of the
stack frame.

Wanted to reply to this separately.
The rationale was that for C++26 CLEANUP_EH_ONLY will push not 0 but 2 elts
to the vector, so I thought increasing it a little bit would help.
Note, there is
   constexpr_ctx new_ctx;
   tree r = t;

   tree_code tcode = TREE_CODE (t);
   switch (tcode)
...
     case CLEANUP_POINT_EXPR:
       {
         auto_vec<tree, 2> cleanups;
         vec<tree> *prev_cleanups = ctx->global->cleanups;
         ctx->global->cleanups = &cleanups;

         auto_vec<tree, 10> save_exprs;
         constexpr_ctx new_ctx = *ctx;
         new_ctx.save_exprs = &save_exprs;
before this change.
Unpatched trunk needs on x86_64 frame size 0x148,
with auto_vec<tree, 4> that is 0x158, with auto_vec<tree, 0> (aka
auto_vec<tree> 0x138 and with auto_vec<tree, 10> changed to
auto_vec<tree> 0xd8.
Changing
        constexpr_ctx new_ctx = *ctx;
to
        new_ctx = *ctx;
doesn't free anything but is a good idea anyway.

I guess one option is to decrease both auto_vec to 0, another
is outline the CLEANUP_POINT_EXPR to a separate function (and perhaps
use [[gnu::noinline]] on it conditionally).
What do you prefer (though guess it can be done separately and I can
just drop the 2->4 change from the patch)?
And perhaps investigate more how to decrease the frame size further.

I don't have a strong preference between the two approaches. Dropping both to 0 is certainly simpler, is there any noticeable performance impact? And yes, it makes sense for this to be done separately.

Jason

Reply via email to