Hi! As we don't have a SRA fix for PR117971, I thought I'd try to improve it using an optimization during gimplification. This is about the tree-ssa/pr78687.C testcase, which is a variant with struct option_1 { void *a, *b, *c, *d, *e; }; struct option_2 { }; variants. Since the PR116416 changes we clear the whole object, which is at least sizeof (size_t) + 5 * sizeof (void *) large, because there is a single byte (option_2) zero initialized (it uses option_2() rather than e.g. option_2{}), that causes categorize_ctor_elements to say that the whole CONSTRUCTOR is !complete_p and we clear everything, which kills SRA for some reason (I think SRA needs to be fixed in any case, because the clearing because of required clearing of padding bits will now be more common since the PR116416 changes). So, what I wanted to do is avoid clearing everything when we know only a single subobject is incomplete, in that case we can clear just that and not everything else. The patch works by extending categorize_ctor_elements to note the access path, vector of constructor_elt indexes (purposes) for the only incomplete subobject if the CONSTRUCTOR is !complete_p but everything except that subobject is complete. This is done by pushing there the complete access path on the first incomplete subobject encountered and truncating that access path later if we notice incomplete subobjects somewhere else. E.g. on C23: struct S { int a; }; struct T { struct S b; int c; struct S d; }; struct U { struct T e; int f; struct T g; }; void f0 (struct U *);
void f1 (void) { struct U u = { { {}, 1, {} }, 2, { {}, 3, {} } }; f0 (&u); } void f2 (void) { struct U u = { { { 1 }, 2, { 3 } }, 4, { { 5 }, 6, {} } }; f0 (&u); } void f3 (void) { struct U u = { { { }, 2, { } }, 4, { { 5 }, 6, { 7 } } }; f0 (&u); } the optimization doesn't change anything in f1 (there are multiple incomplete bits all over), changes - u = {}; u.e.b.a = 1; u.e.c = 2; u.e.d.a = 3; u.f = 4; u.g.b.a = 5; u.g.c = 6; + u.g.d = {}; f0 (&u); in f2, only u.g.d is incomplete, and in f3 - u = {}; + u.e = {}; u.e.c = 2; u.f = 4; u.g.b.a = 5; ... (u.e has multiple incomplete subobjects, but nothing else does). Now, when looking at the tree-ssa/pr78687.C testcase, I've noticed that the expected - D.10177 = {}; + D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {}; change doesn't happen there (instead the clearing is dropped) because gimplification optimizes it away, so I've filed PR118002 and worked on that and posted a patch for that. Now, unfortunately with the version of the patch posted without the cp-gimplify.cc hunk (which regressed empty1.C test) it is still removed, so I wonder if this optimization doesn't have to use some langhook where the C++ FE would tell that the store is to an empty base that the gimplification hook would optimize away and so needs to try an outer object. Or if we need to somehow force the zero initializers in this case anyway somehow by skipping further gimplification on it or what. The thing is if the outer = {}; would be added (but that would need to be verified if it wouldn't be optimized away by cp_gimplify_expr somehow), that clears the whole thing, so just clearing some subobject of it should be still fine, it is necessarily before we actually store some other data members that might overlay those. And another problem is that the patch regresses the tree-ssa/pr90883.C test, and in that case it actually makes the emitted code worse. The testcase has a struct with char a[7]; int b; members with NSDMI, previously we cleared everything, because the a initializer was {} and thus "incomplete", even when the padding byte doesn't have to be cleared, but it actually results in xorl %eax, %eax xorl %edx, %edx ret Now, with the patch we see that not the whole struct is incomplete with D.2618 = { {}, 0 }, just a, and so - D.2618 = {}; + D.2618.a = {}; + D.2618.b = 0; (note, the D.2618.b part was omitted because of the clearing of the whole). Alexandre added recently an optimization to clear whole object anyway, /* If the object is small enough to go in registers, and it's not required to be constructed in memory, clear it first. That will avoid wasting cycles preserving any padding bits that might be there, and if there aren't any, the compiler is smart enough to optimize the clearing out. */ else if (complete_p <= 0 && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object) && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type)) && optimize) cleared = true; but unfortunately that doesn't trigger here, because type has BLKmode here, it is 12 bytes. So maybe we'd need to go smarter here, don't perform the clearing of the subobject that needs clearing as part of the gimplify_init_ctor_eval, but clear it separately first and maybe extend it a little bit first into the neighbouring padding or fields, in order to have more efficient clearing size if it isn't too large. On the tree-ssa/ssa-dse-1.C testcase there is actually no code generation change, just small difference in the printed IL. Thoughts on this? Note, this can certainly be postponed to GCC 16... 2024-12-12 Jakub Jelinek <ja...@redhat.com> PR c++/116416 * expr.h (categorize_ctor_elements): Add vec<tree> * argument defaulted to NULL. * expr.cc (categorize_ctor_elements_1): Add p_incomplete_elt, p_purposes and p_incomplete_depth arguments. Use them to note the outermost incomplete subobject if *p_complete = 0. Adjust recursive call. (categorize_ctor_elements): Add p_incomplete_elt argument, adjust call to categorize_ctor_elements_1. * gimplify.cc (gimplify_init_ctor_eval): Add partial_clear and partial_clear_count arguments, adjust recursive calls, if non-NULL arrange to clear the specified subobject and use cleared = true for its initialization. (gimplify_init_ctor_eval_range): Adjust gimplify_init_ctor_eval caller. (gimplify_init_constructor): Adjust categorize_ctor_elements and gimplify_init_ctor_eval calls. If categorize_ctor_elements returns the CONSTRUCTOR is incomplete, but only because some subobject of it is incomplete, allow not clearing the whole object and instead just clear the subobject. * g++.dg/tree-ssa/ssa-dse-1.C: Allow char[200] instead of struct FixBuf in the dump. --- gcc/expr.h.jj 2024-11-30 01:47:37.204351201 +0100 +++ gcc/expr.h 2024-12-11 12:55:07.623191379 +0100 @@ -361,7 +361,7 @@ extern unsigned HOST_WIDE_INT highest_po extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *, HOST_WIDE_INT *, HOST_WIDE_INT *, - int *); + int *, vec<tree> * = NULL); extern bool type_has_padding_at_level_p (tree); extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1); extern void store_constructor (tree, rtx, int, poly_int64, bool); --- gcc/expr.cc.jj 2024-12-07 11:35:49.374441121 +0100 +++ gcc/expr.cc 2024-12-11 17:14:21.937638483 +0100 @@ -7095,12 +7095,21 @@ count_type_elements (const_tree type, bo } } -/* Helper for categorize_ctor_elements. Identical interface. */ +/* Helper for categorize_ctor_elements. Similar interface, + just with P_PURPOSES and P_INCOMPLETE_DEPTH extra arguments. + *P_PURPOSES is a vector of constructor elt indexes for the + first incomplete subobject, *P_INCOMPLETE_DEPTH is initially -1, + set to the depth of the first incomplete subobject when P_INCOMPLETE_ELT + is first initialized when setting *P_COMPLETE to 0 (if ever) and + decreased when leaving the corresponding categorize_ctor_elements_1 + calls that pushed indexes in there. */ static bool categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts, HOST_WIDE_INT *p_unique_nz_elts, - HOST_WIDE_INT *p_init_elts, int *p_complete) + HOST_WIDE_INT *p_init_elts, int *p_complete, + vec<tree> *p_incomplete_elt, vec<tree> *p_purposes, + HOST_WIDE_INT *p_incomplete_depth) { unsigned HOST_WIDE_INT idx; HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields; @@ -7139,9 +7148,30 @@ categorize_ctor_elements_1 (const_tree c case CONSTRUCTOR: { HOST_WIDE_INT nz = 0, unz = 0, ic = 0; + bool pushed = false; + if (p_incomplete_elt && *p_incomplete_depth == -1) + { + tree p = purpose; + if (p + && TREE_CODE (p) == RANGE_EXPR + && simple_cst_equal (TREE_OPERAND (p, 0), + TREE_OPERAND (p, 1))) + p = TREE_OPERAND (p, 1); + p_purposes->safe_push (p); + pushed = true; + } bool const_elt_p = categorize_ctor_elements_1 (value, &nz, &unz, - &ic, p_complete); + &ic, p_complete, + p_incomplete_elt, + p_purposes, + p_incomplete_depth); + if (pushed) + { + p_purposes->pop (); + if (*p_incomplete_depth > 0) + --*p_incomplete_depth; + } nz_elts += mult * nz; unique_nz_elts += unz; @@ -7227,6 +7257,7 @@ categorize_ctor_elements_1 (const_tree c } } + int prev_complete = *p_complete; if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor), num_fields, elt_type)) *p_complete = 0; @@ -7255,6 +7286,48 @@ categorize_ctor_elements_1 (const_tree c else if (*p_complete > 0 && type_has_padding_at_level_p (TREE_TYPE (ctor))) *p_complete = -1; + if (*p_complete == 0 && p_incomplete_elt) + { + if (prev_complete) + { + if (p_purposes->is_empty ()) + *p_incomplete_depth = 0; + else + for (tree p : *p_purposes) + if (p == NULL_TREE || TREE_CODE (p) == RANGE_EXPR) + { + *p_incomplete_depth = 0; + break; + } + if (*p_incomplete_depth == -1) + { + for (tree p : *p_purposes) + p_incomplete_elt->safe_push (p); + *p_incomplete_depth = p_purposes->length (); + } + } + else if (p_incomplete_elt->length () > (unsigned) *p_incomplete_depth) + { + if (!complete_ctor_at_level_p (TREE_TYPE (ctor), + num_fields, elt_type)) + p_incomplete_elt->truncate (*p_incomplete_depth); + else if (TREE_CODE (TREE_TYPE (ctor)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ctor)) == QUAL_UNION_TYPE) + { + if (CONSTRUCTOR_ZERO_PADDING_BITS (ctor) + && (num_fields + ? simple_cst_equal (TYPE_SIZE (TREE_TYPE (ctor)), + TYPE_SIZE (elt_type)) != 1 + : type_has_padding_at_level_p (TREE_TYPE (ctor)))) + p_incomplete_elt->truncate (*p_incomplete_depth); + } + else if ((CONSTRUCTOR_ZERO_PADDING_BITS (ctor) + || (flag_zero_init_padding_bits + == ZERO_INIT_PADDING_BITS_ALL)) + && type_has_padding_at_level_p (TREE_TYPE (ctor))) + p_incomplete_elt->truncate (*p_incomplete_depth); + } + } *p_nz_elts += nz_elts; *p_unique_nz_elts += unique_nz_elts; @@ -7281,20 +7354,29 @@ categorize_ctor_elements_1 (const_tree c - -1 if all fields are initialized, but there's padding Return whether or not CTOR is a valid static constant initializer, the same - as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". */ + as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". + + If P_INCOMPLETE_ELT is non-NULL, when P_COMPLETE is set to 0 it can + be set to a vector of CONSTRUCTOR_ELT index identifying the only subobject + which is incomplete. */ bool categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts, HOST_WIDE_INT *p_unique_nz_elts, - HOST_WIDE_INT *p_init_elts, int *p_complete) + HOST_WIDE_INT *p_init_elts, int *p_complete, + vec<tree> *p_incomplete_elt) { *p_nz_elts = 0; *p_unique_nz_elts = 0; *p_init_elts = 0; *p_complete = 1; + auto_vec<tree, 32> purposes; + HOST_WIDE_INT incomplete_depth = -1; return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts, - p_init_elts, p_complete); + p_init_elts, p_complete, + p_incomplete_elt, &purposes, + &incomplete_depth); } /* Return true if constructor CTOR is simple enough to be materialized --- gcc/gimplify.cc.jj 2024-12-07 11:35:49.475439705 +0100 +++ gcc/gimplify.cc 2024-12-11 16:33:42.931975665 +0100 @@ -5372,7 +5372,7 @@ gimplify_init_ctor_preeval (tree *expr_p already been taken care of for us, in gimplify_init_ctor_preeval(). */ static void gimplify_init_ctor_eval (tree, vec<constructor_elt, va_gc> *, - gimple_seq *, bool); + gimple_seq *, bool, tree *, unsigned); static void gimplify_init_ctor_eval_range (tree object, tree lower, tree upper, @@ -5405,7 +5405,7 @@ gimplify_init_ctor_eval_range (tree obje /* NB we might have to call ourself recursively through gimplify_init_ctor_eval if the value is a constructor. */ gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), - pre_p, cleared); + pre_p, cleared, NULL, 0); else { if (gimplify_expr (&value, pre_p, NULL, is_gimple_val, fb_rvalue) @@ -5436,11 +5436,14 @@ gimplify_init_ctor_eval_range (tree obje MODIFY_EXPRs for a CONSTRUCTOR. OBJECT is the LHS against which the assignments should happen. ELTS is the CONSTRUCTOR_ELTS of the CONSTRUCTOR. CLEARED is true if the entire LHS object has been - zeroed first. */ + zeroed first. PARTIAL_CLEAR is a pointer to an array of + PARTIAL_CLEAR_COUNT purposes which identifies subobject which should + be cleared. */ static void gimplify_init_ctor_eval (tree object, vec<constructor_elt, va_gc> *elts, - gimple_seq *pre_p, bool cleared) + gimple_seq *pre_p, bool cleared, + tree *partial_clear, unsigned partial_clear_count) { tree array_elt_type = NULL; unsigned HOST_WIDE_INT ix; @@ -5464,6 +5467,15 @@ gimplify_init_ctor_eval (tree object, ve so we don't have to figure out what's missing ourselves. */ gcc_assert (purpose); + bool this_partial_clear = false; + if (partial_clear + && (purpose == *partial_clear + || (TREE_CODE (purpose) == RANGE_EXPR + && simple_cst_equal (TREE_OPERAND (purpose, 0), + TREE_OPERAND (purpose, 1)) + && TREE_OPERAND (purpose, 1) == *partial_clear))) + this_partial_clear = true; + /* Skip zero-sized fields, unless value has side-effects. This can happen with calls to functions returning a empty type, which we shouldn't discard. As a number of downstream passes don't @@ -5471,6 +5483,7 @@ gimplify_init_ctor_eval (tree object, ve the MODIFY_EXPR we make below to drop the assignment statement. */ if (!TREE_SIDE_EFFECTS (value) && TREE_CODE (purpose) == FIELD_DECL + && !this_partial_clear && is_empty_type (TREE_TYPE (purpose))) continue; @@ -5510,10 +5523,27 @@ gimplify_init_ctor_eval (tree object, ve unshare_expr (object), purpose, NULL_TREE); } + if (this_partial_clear && partial_clear_count == 1) + { + tree new_ctor = build_constructor (TREE_TYPE (cref), NULL); + if (is_empty_type (TREE_TYPE (new_ctor))) + CONSTRUCTOR_ZERO_PADDING_BITS (new_ctor) = 1; + tree init = build2 (INIT_EXPR, TREE_TYPE (cref), unshare_expr (cref), + new_ctor); + gimplify_and_add (init, pre_p); + ggc_free (init); + } if (TREE_CODE (value) == CONSTRUCTOR && TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE) - gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), - pre_p, cleared); + gimplify_init_ctor_eval (cref, CONSTRUCTOR_ELTS (value), pre_p, + cleared || (this_partial_clear + && partial_clear_count == 1), + (this_partial_clear + && partial_clear_count > 1) + ? partial_clear + 1 : NULL, + (this_partial_clear + && partial_clear_count > 1) + ? partial_clear_count - 1 : 0); else if (TREE_CODE (value) == RAW_DATA_CST) { if (RAW_DATA_LENGTH (value) <= 32) @@ -5748,6 +5778,7 @@ gimplify_init_constructor (tree *expr_p, HOST_WIDE_INT num_unique_nonzero_elements; int complete_p; bool valid_const_initializer; + auto_vec<tree, 32> incomplete_elt; /* Aggregate types must lower constructors to initialization of individual elements. The exception is that a CONSTRUCTOR node @@ -5772,7 +5803,8 @@ gimplify_init_constructor (tree *expr_p, valid_const_initializer = categorize_ctor_elements (ctor, &num_nonzero_elements, &num_unique_nonzero_elements, - &num_ctor_elements, &complete_p); + &num_ctor_elements, &complete_p, + &incomplete_elt); /* If a const aggregate variable is being initialized, then it should never be a lose to promote the variable to be static. */ @@ -5819,6 +5851,9 @@ gimplify_init_constructor (tree *expr_p, if (VAR_P (object) && !notify_temp_creation) TREE_READONLY (object) = 0; + if (CONSTRUCTOR_NO_CLEARING (ctor)) + incomplete_elt.truncate (0); + /* If there are "lots" of initialized elements, even discounting those that are not address constants (and thus *must* be computed at runtime), then partition the constructor into @@ -5827,11 +5862,18 @@ gimplify_init_constructor (tree *expr_p, /* TODO. There's code in cp/typeck.cc to do this. */ if (int_size_in_bytes (TREE_TYPE (ctor)) < 0) - /* store_constructor will ignore the clearing of variable-sized - objects. Initializers for such objects must explicitly set - every field that needs to be set. */ - cleared = false; - else if (!complete_p) + { + /* store_constructor will ignore the clearing of variable-sized + objects. Initializers for such objects must explicitly set + every field that needs to be set. */ + cleared = false; + incomplete_elt.truncate (0); + } + else if (!complete_p + /* If complete_p is needed just for a subset of subobjects, + don't clear everything unless we decide to clear + it anyway for other reasons. */ + && incomplete_elt.is_empty ()) /* If the constructor isn't complete, clear the whole object beforehand, unless CONSTRUCTOR_NO_CLEARING is set on it. @@ -5984,7 +6026,17 @@ gimplify_init_constructor (tree *expr_p, if (!cleared || num_nonzero_elements > 0 || ctor_has_side_effects_p) - gimplify_init_ctor_eval (object, elts, pre_p, cleared); + { + tree *partial_clear = NULL; + unsigned partial_clear_count = 0; + if (!cleared && !complete_p && !incomplete_elt.is_empty ()) + { + partial_clear = incomplete_elt.address (); + partial_clear_count = incomplete_elt.length (); + } + gimplify_init_ctor_eval (object, elts, pre_p, cleared, + partial_clear, partial_clear_count); + } *expr_p = NULL_TREE; } --- gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C.jj 2020-01-12 11:54:37.275400404 +0100 +++ gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C 2024-12-12 10:16:00.332435450 +0100 @@ -97,4 +97,4 @@ int main() } -/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> \\\[\\(struct FixBuf \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ +/* { dg-final { scan-tree-dump-times "MEM <char\\\[\[0-9\]+]> \\\[\\((?:struct FixBuf|char\\\[200\\\]) \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ Jakub