On Thu, 12 Dec 2024, Jakub Jelinek wrote:

> Hi!
> 
> As we don't have a SRA fix for PR117971, I thought I'd try to improve
> it using an optimization during gimplification.
> This is about the tree-ssa/pr78687.C testcase, which is a variant with
> struct option_1
> {
>     void *a, *b, *c, *d, *e;
> };
>     
> struct option_2
> {
> };
> variants.  Since the PR116416 changes we clear the whole object,
> which is at least sizeof (size_t) + 5 * sizeof (void *) large, because
> there is a single byte (option_2) zero initialized (it uses option_2()
> rather than e.g. option_2{}), that causes categorize_ctor_elements
> to say that the whole CONSTRUCTOR is !complete_p and we clear everything,
> which kills SRA for some reason (I think SRA needs to be fixed in any case,
> because the clearing because of required clearing of padding bits will
> now be more common since the PR116416 changes).
> So, what I wanted to do is avoid clearing everything when we know only a
> single subobject is incomplete, in that case we can clear just that and not
> everything else.
> The patch works by extending categorize_ctor_elements to note the access
> path, vector of constructor_elt indexes (purposes) for the only incomplete
> subobject if the CONSTRUCTOR is !complete_p but everything except that
> subobject is complete.
> This is done by pushing there the complete access path on the first
> incomplete subobject encountered and truncating that access path later
> if we notice incomplete subobjects somewhere else.
> E.g. on C23:
> struct S { int a; };
> struct T { struct S b; int c; struct S d; };
> struct U { struct T e; int f; struct T g; };
> void f0 (struct U *);
> 
> void
> f1 (void)
> {
>   struct U u = { { {}, 1, {} }, 2, { {}, 3, {} } };
>   f0 (&u);
> }
> 
> void
> f2 (void)
> {
>   struct U u = { { { 1 }, 2, { 3 } }, 4, { { 5 }, 6, {} } };
>   f0 (&u);
> }
> 
> void
> f3 (void)
> {
>   struct U u = { { { }, 2, { } }, 4, { { 5 }, 6, { 7 } } };
>   f0 (&u);
> }
> the optimization doesn't change anything in f1 (there are multiple
> incomplete bits all over), changes
> -      u = {};
>        u.e.b.a = 1;
>        u.e.c = 2;
>        u.e.d.a = 3;
>        u.f = 4;
>        u.g.b.a = 5;
>        u.g.c = 6;
> +      u.g.d = {};
>        f0 (&u);
> in f2, only u.g.d is incomplete, and in f3
> -      u = {};
> +      u.e = {};
>        u.e.c = 2;
>        u.f = 4;
>        u.g.b.a = 5;
> ...
> (u.e has multiple incomplete subobjects, but nothing else does).
> Now, when looking at the tree-ssa/pr78687.C testcase, I've noticed
> that the expected
> -      D.10177 = {};
> +      D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {};
> change doesn't happen there (instead the clearing is dropped)
> because gimplification optimizes it away,
> so I've filed PR118002 and worked on that and posted a patch for that.
> Now, unfortunately with the version of the patch posted without the
> cp-gimplify.cc hunk (which regressed empty1.C test) it is still removed,
> so I wonder if this optimization doesn't have to use some langhook where
> the C++ FE would tell that the store is to an empty base that the
> gimplification hook would optimize away and so needs to try an outer
> object.  Or if we need to somehow force the zero initializers in this
> case anyway somehow by skipping further gimplification on it or what.
> The thing is if the outer = {}; would be added (but that would need to
> be verified if it wouldn't be optimized away by cp_gimplify_expr somehow),
> that clears the whole thing, so just clearing some subobject of it should
> be still fine, it is necessarily before we actually store some other data
> members that might overlay those.
> 
> And another problem is that the patch regresses the
> tree-ssa/pr90883.C test, and in that case it actually makes the emitted
> code worse.  The testcase has a struct with char a[7]; int b;
> members with NSDMI, previously we cleared everything, because the a
> initializer was {} and thus "incomplete", even when the padding byte
> doesn't have to be cleared, but it actually results in
>         xorl    %eax, %eax
>         xorl    %edx, %edx
>       ret
> Now, with the patch we see that not the whole struct is incomplete
> with D.2618 = { {}, 0 }, just a, and so
> -      D.2618 = {};
> +      D.2618.a = {};
> +      D.2618.b = 0;
> (note, the D.2618.b part was omitted because of the clearing of the whole).
> Alexandre added recently an optimization to clear whole object anyway,
>         /* If the object is small enough to go in registers, and it's
>            not required to be constructed in memory, clear it first.
>            That will avoid wasting cycles preserving any padding bits
>            that might be there, and if there aren't any, the compiler
>            is smart enough to optimize the clearing out.  */
>         else if (complete_p <= 0
>                  && !TREE_ADDRESSABLE (ctor)
>                  && !TREE_THIS_VOLATILE (object)
>                  && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
>                  && optimize)
>           cleared = true;
> but unfortunately that doesn't trigger here, because type has BLKmode here,
> it is 12 bytes.  So maybe we'd need to go smarter here, don't perform the
> clearing of the subobject that needs clearing as part of the
> gimplify_init_ctor_eval, but clear it separately first and maybe extend it
> a little bit first into the neighbouring padding or fields, in order to have
> more efficient clearing size if it isn't too large.
> 
> On the tree-ssa/ssa-dse-1.C testcase there is actually no code generation
> change, just small difference in the printed IL.
> 
> Thoughts on this?
> 
> Note, this can certainly be postponed to GCC 16...

I think we should postpone to GCC 16 and first look at improving SRA.

Richard.

> 
> 2024-12-12  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR c++/116416
>       * expr.h (categorize_ctor_elements): Add vec<tree> * argument
>       defaulted to NULL.
>       * expr.cc (categorize_ctor_elements_1): Add p_incomplete_elt,
>       p_purposes and p_incomplete_depth arguments.  Use them to
>       note the outermost incomplete subobject if *p_complete = 0.
>       Adjust recursive call.
>       (categorize_ctor_elements): Add p_incomplete_elt argument,
>       adjust call to categorize_ctor_elements_1.
>       * gimplify.cc (gimplify_init_ctor_eval): Add partial_clear
>       and partial_clear_count arguments, adjust recursive calls, if
>       non-NULL arrange to clear the specified subobject and use
>       cleared = true for its initialization.
>       (gimplify_init_ctor_eval_range): Adjust gimplify_init_ctor_eval
>       caller.
>       (gimplify_init_constructor): Adjust categorize_ctor_elements and
>       gimplify_init_ctor_eval calls.  If categorize_ctor_elements returns
>       the CONSTRUCTOR is incomplete, but only because some subobject of
>       it is incomplete, allow not clearing the whole object and instead
>       just clear the subobject.
> 
>       * g++.dg/tree-ssa/ssa-dse-1.C: Allow char[200] instead of
>       struct FixBuf in the dump.
> 
> --- gcc/expr.h.jj     2024-11-30 01:47:37.204351201 +0100
> +++ gcc/expr.h        2024-12-11 12:55:07.623191379 +0100
> @@ -361,7 +361,7 @@ extern unsigned HOST_WIDE_INT highest_po
>  
>  extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
>                                     HOST_WIDE_INT *, HOST_WIDE_INT *,
> -                                   int *);
> +                                   int *, vec<tree> * = NULL);
>  extern bool type_has_padding_at_level_p (tree);
>  extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1);
>  extern void store_constructor (tree, rtx, int, poly_int64, bool);
> --- gcc/expr.cc.jj    2024-12-07 11:35:49.374441121 +0100
> +++ gcc/expr.cc       2024-12-11 17:14:21.937638483 +0100
> @@ -7095,12 +7095,21 @@ count_type_elements (const_tree type, bo
>      }
>  }
>  
> -/* Helper for categorize_ctor_elements.  Identical interface.  */
> +/* Helper for categorize_ctor_elements.  Similar interface,
> +   just with P_PURPOSES and P_INCOMPLETE_DEPTH extra arguments.
> +   *P_PURPOSES is a vector of constructor elt indexes for the
> +   first incomplete subobject, *P_INCOMPLETE_DEPTH is initially -1,
> +   set to the depth of the first incomplete subobject when P_INCOMPLETE_ELT
> +   is first initialized when setting *P_COMPLETE to 0 (if ever) and
> +   decreased when leaving the corresponding categorize_ctor_elements_1
> +   calls that pushed indexes in there.  */
>  
>  static bool
>  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                           HOST_WIDE_INT *p_unique_nz_elts,
> -                         HOST_WIDE_INT *p_init_elts, int *p_complete)
> +                         HOST_WIDE_INT *p_init_elts, int *p_complete,
> +                         vec<tree> *p_incomplete_elt, vec<tree> *p_purposes,
> +                         HOST_WIDE_INT *p_incomplete_depth)
>  {
>    unsigned HOST_WIDE_INT idx;
>    HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
> @@ -7139,9 +7148,30 @@ categorize_ctor_elements_1 (const_tree c
>       case CONSTRUCTOR:
>         {
>           HOST_WIDE_INT nz = 0, unz = 0, ic = 0;
> +         bool pushed = false;
>  
> +         if (p_incomplete_elt && *p_incomplete_depth == -1)
> +           {
> +             tree p = purpose;
> +             if (p
> +                 && TREE_CODE (p) == RANGE_EXPR
> +                 && simple_cst_equal (TREE_OPERAND (p, 0),
> +                                      TREE_OPERAND (p, 1)))
> +               p = TREE_OPERAND (p, 1);
> +             p_purposes->safe_push (p);
> +             pushed = true;
> +           }
>           bool const_elt_p = categorize_ctor_elements_1 (value, &nz, &unz,
> -                                                        &ic, p_complete);
> +                                                        &ic, p_complete,
> +                                                        p_incomplete_elt,
> +                                                        p_purposes,
> +                                                        p_incomplete_depth);
> +         if (pushed)
> +           {
> +             p_purposes->pop ();
> +             if (*p_incomplete_depth > 0)
> +               --*p_incomplete_depth;
> +           }
>  
>           nz_elts += mult * nz;
>           unique_nz_elts += unz;
> @@ -7227,6 +7257,7 @@ categorize_ctor_elements_1 (const_tree c
>       }
>      }
>  
> +  int prev_complete = *p_complete;
>    if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                               num_fields, elt_type))
>      *p_complete = 0;
> @@ -7255,6 +7286,48 @@ categorize_ctor_elements_1 (const_tree c
>    else if (*p_complete > 0
>          && type_has_padding_at_level_p (TREE_TYPE (ctor)))
>      *p_complete = -1;
> +  if (*p_complete == 0 && p_incomplete_elt)
> +    {
> +      if (prev_complete)
> +     {
> +       if (p_purposes->is_empty ())
> +         *p_incomplete_depth = 0;
> +       else
> +         for (tree p : *p_purposes)
> +           if (p == NULL_TREE || TREE_CODE (p) == RANGE_EXPR)
> +             {
> +               *p_incomplete_depth = 0;
> +               break;
> +             }
> +       if (*p_incomplete_depth == -1)
> +         {
> +           for (tree p : *p_purposes)
> +             p_incomplete_elt->safe_push (p);
> +           *p_incomplete_depth = p_purposes->length ();
> +         }
> +     }
> +      else if (p_incomplete_elt->length () > (unsigned) *p_incomplete_depth)
> +     {
> +       if (!complete_ctor_at_level_p (TREE_TYPE (ctor),
> +                                      num_fields, elt_type))
> +         p_incomplete_elt->truncate (*p_incomplete_depth);
> +       else if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE
> +                || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE)
> +         {
> +           if (CONSTRUCTOR_ZERO_PADDING_BITS (ctor)
> +               && (num_fields
> +                   ? simple_cst_equal (TYPE_SIZE (TREE_TYPE (ctor)),
> +                                       TYPE_SIZE (elt_type)) != 1
> +                   : type_has_padding_at_level_p (TREE_TYPE (ctor))))
> +             p_incomplete_elt->truncate (*p_incomplete_depth);
> +         }
> +       else if ((CONSTRUCTOR_ZERO_PADDING_BITS (ctor)
> +                 || (flag_zero_init_padding_bits
> +                     == ZERO_INIT_PADDING_BITS_ALL))
> +                && type_has_padding_at_level_p (TREE_TYPE (ctor)))
> +         p_incomplete_elt->truncate (*p_incomplete_depth);
> +     }
> +    }
>  
>    *p_nz_elts += nz_elts;
>    *p_unique_nz_elts += unique_nz_elts;
> @@ -7281,20 +7354,29 @@ categorize_ctor_elements_1 (const_tree c
>       - -1 if all fields are initialized, but there's padding
>  
>     Return whether or not CTOR is a valid static constant initializer, the 
> same
> -   as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0".  */
> +   as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0".
> +
> +   If P_INCOMPLETE_ELT is non-NULL, when P_COMPLETE is set to 0 it can
> +   be set to a vector of CONSTRUCTOR_ELT index identifying the only subobject
> +   which is incomplete.  */
>  
>  bool
>  categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                         HOST_WIDE_INT *p_unique_nz_elts,
> -                       HOST_WIDE_INT *p_init_elts, int *p_complete)
> +                       HOST_WIDE_INT *p_init_elts, int *p_complete,
> +                       vec<tree> *p_incomplete_elt)
>  {
>    *p_nz_elts = 0;
>    *p_unique_nz_elts = 0;
>    *p_init_elts = 0;
>    *p_complete = 1;
> +  auto_vec<tree, 32> purposes;
> +  HOST_WIDE_INT incomplete_depth = -1;
>  
>    return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts,
> -                                  p_init_elts, p_complete);
> +                                  p_init_elts, p_complete,
> +                                  p_incomplete_elt, &purposes,
> +                                  &incomplete_depth);
>  }
>  
>  /* Return true if constructor CTOR is simple enough to be materialized
> --- gcc/gimplify.cc.jj        2024-12-07 11:35:49.475439705 +0100
> +++ gcc/gimplify.cc   2024-12-11 16:33:42.931975665 +0100
> @@ -5372,7 +5372,7 @@ gimplify_init_ctor_preeval (tree *expr_p
>     already been taken care of for us, in gimplify_init_ctor_preeval().  */
>  
>  static void gimplify_init_ctor_eval (tree, vec<constructor_elt, va_gc> *,
> -                                  gimple_seq *, bool);
> +                                  gimple_seq *, bool, tree *, unsigned);
>  
>  static void
>  gimplify_init_ctor_eval_range (tree object, tree lower, tree upper,
> @@ -5405,7 +5405,7 @@ gimplify_init_ctor_eval_range (tree obje
>      /* NB we might have to call ourself recursively through
>         gimplify_init_ctor_eval if the value is a constructor.  */
>      gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value),
> -                          pre_p, cleared);
> +                          pre_p, cleared, NULL, 0);
>    else
>      {
>        if (gimplify_expr (&value, pre_p, NULL, is_gimple_val, fb_rvalue)
> @@ -5436,11 +5436,14 @@ gimplify_init_ctor_eval_range (tree obje
>     MODIFY_EXPRs for a CONSTRUCTOR.  OBJECT is the LHS against which the
>     assignments should happen.  ELTS is the CONSTRUCTOR_ELTS of the
>     CONSTRUCTOR.  CLEARED is true if the entire LHS object has been
> -   zeroed first.  */
> +   zeroed first.  PARTIAL_CLEAR is a pointer to an array of
> +   PARTIAL_CLEAR_COUNT purposes which identifies subobject which should
> +   be cleared.  */
>  
>  static void
>  gimplify_init_ctor_eval (tree object, vec<constructor_elt, va_gc> *elts,
> -                      gimple_seq *pre_p, bool cleared)
> +                      gimple_seq *pre_p, bool cleared,
> +                      tree *partial_clear, unsigned partial_clear_count)
>  {
>    tree array_elt_type = NULL;
>    unsigned HOST_WIDE_INT ix;
> @@ -5464,6 +5467,15 @@ gimplify_init_ctor_eval (tree object, ve
>        so we don't have to figure out what's missing ourselves.  */
>        gcc_assert (purpose);
>  
> +      bool this_partial_clear = false;
> +      if (partial_clear
> +       && (purpose == *partial_clear
> +           || (TREE_CODE (purpose) == RANGE_EXPR
> +               && simple_cst_equal (TREE_OPERAND (purpose, 0),
> +                                    TREE_OPERAND (purpose, 1))
> +               && TREE_OPERAND (purpose, 1) == *partial_clear)))
> +     this_partial_clear = true;
> +
>        /* Skip zero-sized fields, unless value has side-effects.  This can
>        happen with calls to functions returning a empty type, which
>        we shouldn't discard.  As a number of downstream passes don't
> @@ -5471,6 +5483,7 @@ gimplify_init_ctor_eval (tree object, ve
>        the MODIFY_EXPR we make below to drop the assignment statement.  */
>        if (!TREE_SIDE_EFFECTS (value)
>         && TREE_CODE (purpose) == FIELD_DECL
> +       && !this_partial_clear
>         && is_empty_type (TREE_TYPE (purpose)))
>       continue;
>  
> @@ -5510,10 +5523,27 @@ gimplify_init_ctor_eval (tree object, ve
>                        unshare_expr (object), purpose, NULL_TREE);
>       }
>  
> +      if (this_partial_clear && partial_clear_count == 1)
> +     {
> +       tree new_ctor = build_constructor (TREE_TYPE (cref), NULL);
> +       if (is_empty_type (TREE_TYPE (new_ctor)))
> +         CONSTRUCTOR_ZERO_PADDING_BITS (new_ctor) = 1;
> +       tree init = build2 (INIT_EXPR, TREE_TYPE (cref), unshare_expr (cref),
> +                           new_ctor);
> +       gimplify_and_add (init, pre_p);
> +       ggc_free (init);
> +     }
>        if (TREE_CODE (value) == CONSTRUCTOR
>         && TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE)
> -     gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value),
> -                              pre_p, cleared);
> +     gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), pre_p,
> +                              cleared || (this_partial_clear
> +                                          && partial_clear_count == 1),
> +                              (this_partial_clear
> +                               && partial_clear_count > 1)
> +                              ? partial_clear + 1 : NULL,
> +                              (this_partial_clear
> +                               && partial_clear_count > 1)
> +                              ? partial_clear_count - 1 : 0);
>        else if (TREE_CODE (value) == RAW_DATA_CST)
>       {
>         if (RAW_DATA_LENGTH (value) <= 32)
> @@ -5748,6 +5778,7 @@ gimplify_init_constructor (tree *expr_p,
>       HOST_WIDE_INT num_unique_nonzero_elements;
>       int complete_p;
>       bool valid_const_initializer;
> +     auto_vec<tree, 32> incomplete_elt;
>  
>       /* Aggregate types must lower constructors to initialization of
>          individual elements.  The exception is that a CONSTRUCTOR node
> @@ -5772,7 +5803,8 @@ gimplify_init_constructor (tree *expr_p,
>       valid_const_initializer
>         = categorize_ctor_elements (ctor, &num_nonzero_elements,
>                                     &num_unique_nonzero_elements,
> -                                   &num_ctor_elements, &complete_p);
> +                                   &num_ctor_elements, &complete_p,
> +                                   &incomplete_elt);
>  
>       /* If a const aggregate variable is being initialized, then it
>          should never be a lose to promote the variable to be static.  */
> @@ -5819,6 +5851,9 @@ gimplify_init_constructor (tree *expr_p,
>       if (VAR_P (object) && !notify_temp_creation)
>         TREE_READONLY (object) = 0;
>  
> +     if (CONSTRUCTOR_NO_CLEARING (ctor))
> +       incomplete_elt.truncate (0);
> +
>       /* If there are "lots" of initialized elements, even discounting
>          those that are not address constants (and thus *must* be
>          computed at runtime), then partition the constructor into
> @@ -5827,11 +5862,18 @@ gimplify_init_constructor (tree *expr_p,
>       /* TODO.  There's code in cp/typeck.cc to do this.  */
>  
>       if (int_size_in_bytes (TREE_TYPE (ctor)) < 0)
> -       /* store_constructor will ignore the clearing of variable-sized
> -          objects.  Initializers for such objects must explicitly set
> -          every field that needs to be set.  */
> -       cleared = false;
> -     else if (!complete_p)
> +       {
> +         /* store_constructor will ignore the clearing of variable-sized
> +            objects.  Initializers for such objects must explicitly set
> +            every field that needs to be set.  */
> +         cleared = false;
> +         incomplete_elt.truncate (0);
> +       }
> +     else if (!complete_p
> +              /* If complete_p is needed just for a subset of subobjects,
> +                 don't clear everything unless we decide to clear
> +                 it anyway for other reasons.  */
> +              && incomplete_elt.is_empty ())
>         /* If the constructor isn't complete, clear the whole object
>            beforehand, unless CONSTRUCTOR_NO_CLEARING is set on it.
>  
> @@ -5984,7 +6026,17 @@ gimplify_init_constructor (tree *expr_p,
>       if (!cleared
>           || num_nonzero_elements > 0
>           || ctor_has_side_effects_p)
> -       gimplify_init_ctor_eval (object, elts, pre_p, cleared);
> +       {
> +         tree *partial_clear = NULL;
> +         unsigned partial_clear_count = 0;
> +         if (!cleared && !complete_p && !incomplete_elt.is_empty ())
> +           {
> +             partial_clear = incomplete_elt.address ();
> +             partial_clear_count = incomplete_elt.length ();
> +           }
> +         gimplify_init_ctor_eval (object, elts, pre_p, cleared,
> +                                  partial_clear, partial_clear_count);
> +       }
>  
>       *expr_p = NULL_TREE;
>        }
> --- gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C.jj      2020-01-12 
> 11:54:37.275400404 +0100
> +++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C 2024-12-12 10:16:00.332435450 
> +0100
> @@ -97,4 +97,4 @@ int main()
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> \\\[\\(struct 
> FixBuf \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> 
> \\\[\\((?:struct FixBuf|char\\\[200\\\]) \\*\\)&<retval> \\+ \[0-9\]+B\\\] = 
> {}" 1 "dse1" } } */
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Reply via email to