On Thu, 12 Dec 2024, Jakub Jelinek wrote: > Hi! > > As we don't have a SRA fix for PR117971, I thought I'd try to improve > it using an optimization during gimplification. > This is about the tree-ssa/pr78687.C testcase, which is a variant with > struct option_1 > { > void *a, *b, *c, *d, *e; > }; > > struct option_2 > { > }; > variants. Since the PR116416 changes we clear the whole object, > which is at least sizeof (size_t) + 5 * sizeof (void *) large, because > there is a single byte (option_2) zero initialized (it uses option_2() > rather than e.g. option_2{}), that causes categorize_ctor_elements > to say that the whole CONSTRUCTOR is !complete_p and we clear everything, > which kills SRA for some reason (I think SRA needs to be fixed in any case, > because the clearing because of required clearing of padding bits will > now be more common since the PR116416 changes). > So, what I wanted to do is avoid clearing everything when we know only a > single subobject is incomplete, in that case we can clear just that and not > everything else. > The patch works by extending categorize_ctor_elements to note the access > path, vector of constructor_elt indexes (purposes) for the only incomplete > subobject if the CONSTRUCTOR is !complete_p but everything except that > subobject is complete. > This is done by pushing there the complete access path on the first > incomplete subobject encountered and truncating that access path later > if we notice incomplete subobjects somewhere else. > E.g. on C23: > struct S { int a; }; > struct T { struct S b; int c; struct S d; }; > struct U { struct T e; int f; struct T g; }; > void f0 (struct U *); > > void > f1 (void) > { > struct U u = { { {}, 1, {} }, 2, { {}, 3, {} } }; > f0 (&u); > } > > void > f2 (void) > { > struct U u = { { { 1 }, 2, { 3 } }, 4, { { 5 }, 6, {} } }; > f0 (&u); > } > > void > f3 (void) > { > struct U u = { { { }, 2, { } }, 4, { { 5 }, 6, { 7 } } }; > f0 (&u); > } > the optimization doesn't change anything in f1 (there are multiple > incomplete bits all over), changes > - u = {}; > u.e.b.a = 1; > u.e.c = 2; > u.e.d.a = 3; > u.f = 4; > u.g.b.a = 5; > u.g.c = 6; > + u.g.d = {}; > f0 (&u); > in f2, only u.g.d is incomplete, and in f3 > - u = {}; > + u.e = {}; > u.e.c = 2; > u.f = 4; > u.g.b.a = 5; > ... > (u.e has multiple incomplete subobjects, but nothing else does). > Now, when looking at the tree-ssa/pr78687.C testcase, I've noticed > that the expected > - D.10177 = {}; > + D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {}; > change doesn't happen there (instead the clearing is dropped) > because gimplification optimizes it away, > so I've filed PR118002 and worked on that and posted a patch for that. > Now, unfortunately with the version of the patch posted without the > cp-gimplify.cc hunk (which regressed empty1.C test) it is still removed, > so I wonder if this optimization doesn't have to use some langhook where > the C++ FE would tell that the store is to an empty base that the > gimplification hook would optimize away and so needs to try an outer > object. Or if we need to somehow force the zero initializers in this > case anyway somehow by skipping further gimplification on it or what. > The thing is if the outer = {}; would be added (but that would need to > be verified if it wouldn't be optimized away by cp_gimplify_expr somehow), > that clears the whole thing, so just clearing some subobject of it should > be still fine, it is necessarily before we actually store some other data > members that might overlay those. > > And another problem is that the patch regresses the > tree-ssa/pr90883.C test, and in that case it actually makes the emitted > code worse. The testcase has a struct with char a[7]; int b; > members with NSDMI, previously we cleared everything, because the a > initializer was {} and thus "incomplete", even when the padding byte > doesn't have to be cleared, but it actually results in > xorl %eax, %eax > xorl %edx, %edx > ret > Now, with the patch we see that not the whole struct is incomplete > with D.2618 = { {}, 0 }, just a, and so > - D.2618 = {}; > + D.2618.a = {}; > + D.2618.b = 0; > (note, the D.2618.b part was omitted because of the clearing of the whole). > Alexandre added recently an optimization to clear whole object anyway, > /* If the object is small enough to go in registers, and it's > not required to be constructed in memory, clear it first. > That will avoid wasting cycles preserving any padding bits > that might be there, and if there aren't any, the compiler > is smart enough to optimize the clearing out. */ > else if (complete_p <= 0 > && !TREE_ADDRESSABLE (ctor) > && !TREE_THIS_VOLATILE (object) > && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type)) > && optimize) > cleared = true; > but unfortunately that doesn't trigger here, because type has BLKmode here, > it is 12 bytes. So maybe we'd need to go smarter here, don't perform the > clearing of the subobject that needs clearing as part of the > gimplify_init_ctor_eval, but clear it separately first and maybe extend it > a little bit first into the neighbouring padding or fields, in order to have > more efficient clearing size if it isn't too large. > > On the tree-ssa/ssa-dse-1.C testcase there is actually no code generation > change, just small difference in the printed IL. > > Thoughts on this? > > Note, this can certainly be postponed to GCC 16...
I think we should postpone to GCC 16 and first look at improving SRA. Richard. > > 2024-12-12 Jakub Jelinek <ja...@redhat.com> > > PR c++/116416 > * expr.h (categorize_ctor_elements): Add vec<tree> * argument > defaulted to NULL. > * expr.cc (categorize_ctor_elements_1): Add p_incomplete_elt, > p_purposes and p_incomplete_depth arguments. Use them to > note the outermost incomplete subobject if *p_complete = 0. > Adjust recursive call. > (categorize_ctor_elements): Add p_incomplete_elt argument, > adjust call to categorize_ctor_elements_1. > * gimplify.cc (gimplify_init_ctor_eval): Add partial_clear > and partial_clear_count arguments, adjust recursive calls, if > non-NULL arrange to clear the specified subobject and use > cleared = true for its initialization. > (gimplify_init_ctor_eval_range): Adjust gimplify_init_ctor_eval > caller. > (gimplify_init_constructor): Adjust categorize_ctor_elements and > gimplify_init_ctor_eval calls. If categorize_ctor_elements returns > the CONSTRUCTOR is incomplete, but only because some subobject of > it is incomplete, allow not clearing the whole object and instead > just clear the subobject. > > * g++.dg/tree-ssa/ssa-dse-1.C: Allow char[200] instead of > struct FixBuf in the dump. > > --- gcc/expr.h.jj 2024-11-30 01:47:37.204351201 +0100 > +++ gcc/expr.h 2024-12-11 12:55:07.623191379 +0100 > @@ -361,7 +361,7 @@ extern unsigned HOST_WIDE_INT highest_po > > extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *, > HOST_WIDE_INT *, HOST_WIDE_INT *, > - int *); > + int *, vec<tree> * = NULL); > extern bool type_has_padding_at_level_p (tree); > extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1); > extern void store_constructor (tree, rtx, int, poly_int64, bool); > --- gcc/expr.cc.jj 2024-12-07 11:35:49.374441121 +0100 > +++ gcc/expr.cc 2024-12-11 17:14:21.937638483 +0100 > @@ -7095,12 +7095,21 @@ count_type_elements (const_tree type, bo > } > } > > -/* Helper for categorize_ctor_elements. Identical interface. */ > +/* Helper for categorize_ctor_elements. Similar interface, > + just with P_PURPOSES and P_INCOMPLETE_DEPTH extra arguments. > + *P_PURPOSES is a vector of constructor elt indexes for the > + first incomplete subobject, *P_INCOMPLETE_DEPTH is initially -1, > + set to the depth of the first incomplete subobject when P_INCOMPLETE_ELT > + is first initialized when setting *P_COMPLETE to 0 (if ever) and > + decreased when leaving the corresponding categorize_ctor_elements_1 > + calls that pushed indexes in there. */ > > static bool > categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, > HOST_WIDE_INT *p_unique_nz_elts, > - HOST_WIDE_INT *p_init_elts, int *p_complete) > + HOST_WIDE_INT *p_init_elts, int *p_complete, > + vec<tree> *p_incomplete_elt, vec<tree> *p_purposes, > + HOST_WIDE_INT *p_incomplete_depth) > { > unsigned HOST_WIDE_INT idx; > HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields; > @@ -7139,9 +7148,30 @@ categorize_ctor_elements_1 (const_tree c > case CONSTRUCTOR: > { > HOST_WIDE_INT nz = 0, unz = 0, ic = 0; > + bool pushed = false; > > + if (p_incomplete_elt && *p_incomplete_depth == -1) > + { > + tree p = purpose; > + if (p > + && TREE_CODE (p) == RANGE_EXPR > + && simple_cst_equal (TREE_OPERAND (p, 0), > + TREE_OPERAND (p, 1))) > + p = TREE_OPERAND (p, 1); > + p_purposes->safe_push (p); > + pushed = true; > + } > bool const_elt_p = categorize_ctor_elements_1 (value, &nz, &unz, > - &ic, p_complete); > + &ic, p_complete, > + p_incomplete_elt, > + p_purposes, > + p_incomplete_depth); > + if (pushed) > + { > + p_purposes->pop (); > + if (*p_incomplete_depth > 0) > + --*p_incomplete_depth; > + } > > nz_elts += mult * nz; > unique_nz_elts += unz; > @@ -7227,6 +7257,7 @@ categorize_ctor_elements_1 (const_tree c > } > } > > + int prev_complete = *p_complete; > if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor), > num_fields, elt_type)) > *p_complete = 0; > @@ -7255,6 +7286,48 @@ categorize_ctor_elements_1 (const_tree c > else if (*p_complete > 0 > && type_has_padding_at_level_p (TREE_TYPE (ctor))) > *p_complete = -1; > + if (*p_complete == 0 && p_incomplete_elt) > + { > + if (prev_complete) > + { > + if (p_purposes->is_empty ()) > + *p_incomplete_depth = 0; > + else > + for (tree p : *p_purposes) > + if (p == NULL_TREE || TREE_CODE (p) == RANGE_EXPR) > + { > + *p_incomplete_depth = 0; > + break; > + } > + if (*p_incomplete_depth == -1) > + { > + for (tree p : *p_purposes) > + p_incomplete_elt->safe_push (p); > + *p_incomplete_depth = p_purposes->length (); > + } > + } > + else if (p_incomplete_elt->length () > (unsigned) *p_incomplete_depth) > + { > + if (!complete_ctor_at_level_p (TREE_TYPE (ctor), > + num_fields, elt_type)) > + p_incomplete_elt->truncate (*p_incomplete_depth); > + else if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE) > + { > + if (CONSTRUCTOR_ZERO_PADDING_BITS (ctor) > + && (num_fields > + ? simple_cst_equal (TYPE_SIZE (TREE_TYPE (ctor)), > + TYPE_SIZE (elt_type)) != 1 > + : type_has_padding_at_level_p (TREE_TYPE (ctor)))) > + p_incomplete_elt->truncate (*p_incomplete_depth); > + } > + else if ((CONSTRUCTOR_ZERO_PADDING_BITS (ctor) > + || (flag_zero_init_padding_bits > + == ZERO_INIT_PADDING_BITS_ALL)) > + && type_has_padding_at_level_p (TREE_TYPE (ctor))) > + p_incomplete_elt->truncate (*p_incomplete_depth); > + } > + } > > *p_nz_elts += nz_elts; > *p_unique_nz_elts += unique_nz_elts; > @@ -7281,20 +7354,29 @@ categorize_ctor_elements_1 (const_tree c > - -1 if all fields are initialized, but there's padding > > Return whether or not CTOR is a valid static constant initializer, the > same > - as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". */ > + as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". > + > + If P_INCOMPLETE_ELT is non-NULL, when P_COMPLETE is set to 0 it can > + be set to a vector of CONSTRUCTOR_ELT index identifying the only subobject > + which is incomplete. */ > > bool > categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts, > HOST_WIDE_INT *p_unique_nz_elts, > - HOST_WIDE_INT *p_init_elts, int *p_complete) > + HOST_WIDE_INT *p_init_elts, int *p_complete, > + vec<tree> *p_incomplete_elt) > { > *p_nz_elts = 0; > *p_unique_nz_elts = 0; > *p_init_elts = 0; > *p_complete = 1; > + auto_vec<tree, 32> purposes; > + HOST_WIDE_INT incomplete_depth = -1; > > return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts, > - p_init_elts, p_complete); > + p_init_elts, p_complete, > + p_incomplete_elt, &purposes, > + &incomplete_depth); > } > > /* Return true if constructor CTOR is simple enough to be materialized > --- gcc/gimplify.cc.jj 2024-12-07 11:35:49.475439705 +0100 > +++ gcc/gimplify.cc 2024-12-11 16:33:42.931975665 +0100 > @@ -5372,7 +5372,7 @@ gimplify_init_ctor_preeval (tree *expr_p > already been taken care of for us, in gimplify_init_ctor_preeval(). */ > > static void gimplify_init_ctor_eval (tree, vec<constructor_elt, va_gc> *, > - gimple_seq *, bool); > + gimple_seq *, bool, tree *, unsigned); > > static void > gimplify_init_ctor_eval_range (tree object, tree lower, tree upper, > @@ -5405,7 +5405,7 @@ gimplify_init_ctor_eval_range (tree obje > /* NB we might have to call ourself recursively through > gimplify_init_ctor_eval if the value is a constructor. */ > gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), > - pre_p, cleared); > + pre_p, cleared, NULL, 0); > else > { > if (gimplify_expr (&value, pre_p, NULL, is_gimple_val, fb_rvalue) > @@ -5436,11 +5436,14 @@ gimplify_init_ctor_eval_range (tree obje > MODIFY_EXPRs for a CONSTRUCTOR. OBJECT is the LHS against which the > assignments should happen. ELTS is the CONSTRUCTOR_ELTS of the > CONSTRUCTOR. CLEARED is true if the entire LHS object has been > - zeroed first. */ > + zeroed first. PARTIAL_CLEAR is a pointer to an array of > + PARTIAL_CLEAR_COUNT purposes which identifies subobject which should > + be cleared. */ > > static void > gimplify_init_ctor_eval (tree object, vec<constructor_elt, va_gc> *elts, > - gimple_seq *pre_p, bool cleared) > + gimple_seq *pre_p, bool cleared, > + tree *partial_clear, unsigned partial_clear_count) > { > tree array_elt_type = NULL; > unsigned HOST_WIDE_INT ix; > @@ -5464,6 +5467,15 @@ gimplify_init_ctor_eval (tree object, ve > so we don't have to figure out what's missing ourselves. */ > gcc_assert (purpose); > > + bool this_partial_clear = false; > + if (partial_clear > + && (purpose == *partial_clear > + || (TREE_CODE (purpose) == RANGE_EXPR > + && simple_cst_equal (TREE_OPERAND (purpose, 0), > + TREE_OPERAND (purpose, 1)) > + && TREE_OPERAND (purpose, 1) == *partial_clear))) > + this_partial_clear = true; > + > /* Skip zero-sized fields, unless value has side-effects. This can > happen with calls to functions returning a empty type, which > we shouldn't discard. As a number of downstream passes don't > @@ -5471,6 +5483,7 @@ gimplify_init_ctor_eval (tree object, ve > the MODIFY_EXPR we make below to drop the assignment statement. */ > if (!TREE_SIDE_EFFECTS (value) > && TREE_CODE (purpose) == FIELD_DECL > + && !this_partial_clear > && is_empty_type (TREE_TYPE (purpose))) > continue; > > @@ -5510,10 +5523,27 @@ gimplify_init_ctor_eval (tree object, ve > unshare_expr (object), purpose, NULL_TREE); > } > > + if (this_partial_clear && partial_clear_count == 1) > + { > + tree new_ctor = build_constructor (TREE_TYPE (cref), NULL); > + if (is_empty_type (TREE_TYPE (new_ctor))) > + CONSTRUCTOR_ZERO_PADDING_BITS (new_ctor) = 1; > + tree init = build2 (INIT_EXPR, TREE_TYPE (cref), unshare_expr (cref), > + new_ctor); > + gimplify_and_add (init, pre_p); > + ggc_free (init); > + } > if (TREE_CODE (value) == CONSTRUCTOR > && TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE) > - gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), > - pre_p, cleared); > + gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), pre_p, > + cleared || (this_partial_clear > + && partial_clear_count == 1), > + (this_partial_clear > + && partial_clear_count > 1) > + ? partial_clear + 1 : NULL, > + (this_partial_clear > + && partial_clear_count > 1) > + ? partial_clear_count - 1 : 0); > else if (TREE_CODE (value) == RAW_DATA_CST) > { > if (RAW_DATA_LENGTH (value) <= 32) > @@ -5748,6 +5778,7 @@ gimplify_init_constructor (tree *expr_p, > HOST_WIDE_INT num_unique_nonzero_elements; > int complete_p; > bool valid_const_initializer; > + auto_vec<tree, 32> incomplete_elt; > > /* Aggregate types must lower constructors to initialization of > individual elements. The exception is that a CONSTRUCTOR node > @@ -5772,7 +5803,8 @@ gimplify_init_constructor (tree *expr_p, > valid_const_initializer > = categorize_ctor_elements (ctor, &num_nonzero_elements, > &num_unique_nonzero_elements, > - &num_ctor_elements, &complete_p); > + &num_ctor_elements, &complete_p, > + &incomplete_elt); > > /* If a const aggregate variable is being initialized, then it > should never be a lose to promote the variable to be static. */ > @@ -5819,6 +5851,9 @@ gimplify_init_constructor (tree *expr_p, > if (VAR_P (object) && !notify_temp_creation) > TREE_READONLY (object) = 0; > > + if (CONSTRUCTOR_NO_CLEARING (ctor)) > + incomplete_elt.truncate (0); > + > /* If there are "lots" of initialized elements, even discounting > those that are not address constants (and thus *must* be > computed at runtime), then partition the constructor into > @@ -5827,11 +5862,18 @@ gimplify_init_constructor (tree *expr_p, > /* TODO. There's code in cp/typeck.cc to do this. */ > > if (int_size_in_bytes (TREE_TYPE (ctor)) < 0) > - /* store_constructor will ignore the clearing of variable-sized > - objects. Initializers for such objects must explicitly set > - every field that needs to be set. */ > - cleared = false; > - else if (!complete_p) > + { > + /* store_constructor will ignore the clearing of variable-sized > + objects. Initializers for such objects must explicitly set > + every field that needs to be set. */ > + cleared = false; > + incomplete_elt.truncate (0); > + } > + else if (!complete_p > + /* If complete_p is needed just for a subset of subobjects, > + don't clear everything unless we decide to clear > + it anyway for other reasons. */ > + && incomplete_elt.is_empty ()) > /* If the constructor isn't complete, clear the whole object > beforehand, unless CONSTRUCTOR_NO_CLEARING is set on it. > > @@ -5984,7 +6026,17 @@ gimplify_init_constructor (tree *expr_p, > if (!cleared > || num_nonzero_elements > 0 > || ctor_has_side_effects_p) > - gimplify_init_ctor_eval (object, elts, pre_p, cleared); > + { > + tree *partial_clear = NULL; > + unsigned partial_clear_count = 0; > + if (!cleared && !complete_p && !incomplete_elt.is_empty ()) > + { > + partial_clear = incomplete_elt.address (); > + partial_clear_count = incomplete_elt.length (); > + } > + gimplify_init_ctor_eval (object, elts, pre_p, cleared, > + partial_clear, partial_clear_count); > + } > > *expr_p = NULL_TREE; > } > --- gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C.jj 2020-01-12 > 11:54:37.275400404 +0100 > +++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C 2024-12-12 10:16:00.332435450 > +0100 > @@ -97,4 +97,4 @@ int main() > } > > > -/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> \\\[\\(struct > FixBuf \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ > +/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> > \\\[\\((?:struct FixBuf|char\\\[200\\\]) \\*\\)&<retval> \\+ \[0-9\]+B\\\] = > {}" 1 "dse1" } } */ > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)