> Am 24.08.2024 um 06:59 schrieb Alexandre Oliva <ol...@adacore.com>:
>
> Hello, Richi,
>
> Thanks for the review and the feedback.
>
> On Aug 22, 2024, Richard Biener <richard.guent...@gmail.com> wrote:
>
>>> + /* If the object is small enough to go in registers, and it's
>>> + not required to be constructed in memory, clear it first.
>>> + That will avoid wasting cycles preserving any padding bits
>>> + that might be there, and if there aren't any, the compiler
>>> + is smart enough to optimize the clearing out. */
>>> + else if (complete_p <= 0
>
>> I wonder if for complete_p == 1 zeroing first also makes a difference?
>
> It does, at the very least in that it raises dead assignment warnings.
>
> In cases I analyzed, the stores eventually got combined and it didn't
> make a difference. But it's conceivable that zero-initialization might
> aid some cases on some targets. I didn't pursue that line of
> investigation much: the warnings, and the difficulties I envisioned to
> silence them, led me away.
>
>> I also wonder whether the extra zeroing confuses SRA - try say struct
>> s { short a; int b; } and see if SRA is still happily considering it.
>
> Yeah, SRA seems to be happy to drop the struct object completely, in
> which case the zero-initialization of the padding bits gets discarded,
> and also to SRA relevant fields while keeping the whole object if it's
> needed as a whole, so that the zero-initialization (often narrowed to
> the padding bits only) is retained and aids in merging the stores more
> efficiently.
>
>>> + && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)
>
>> 2nd && to the next line
>
> 'k
>
>>> + && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK
>>> (type))
>>> + && (opt_for_fn (cfun->decl, optimize)
>>> + || opt_for_fn (cfun->decl, optimize_size)))
>
>> that's simplified as && optimize
>
> It felt like a regression to add non-*fun references to flags in a
> function that explicitly mentioned *fun elsewhere, but sure 'optimize'
> looks nicer ;-)
>
>
> Regstrapping on x86_64-linux-gnu. Ok to install?
Ok if Jakub doesn’t have any further comments next week.
Thanks,
Richard
>
> Optimize initialization of small padded objects
>
> When small objects containing padding bits (or bytes) are fully
> initialized, we will often store them in registers, and setting
> bitfields and other small fields will attempt to preserve the
> uninitialized padding bits, which tends to be expensive.
> Zero-initializing registers, OTOH, tends to be cheap.
>
> So, if we're optimizing, zero-initialize such small padded objects
> even if that's not needed for correctness. We can't zero-initialize
> all such padding objects, though: if there's no padding whatsoever,
> and all fields are initialized with nonzero, the zero initialization
> would be flagged as dead. That's why we introduce machinery to detect
> whether objects have padding bits. I considered distinguishing
> between bitfields, units and larger padding elements, but I didn't
> pursue that distinction.
>
> Since the object's zero-initialization subsumes fields'
> zero-initialization, the empty string test in builtin-snprintf-6.c's
> test_assign_aggregate would regress without the addition of
> native_encode_constructor.
>
>
> for gcc/ChangeLog
>
> * expr.cc (categorize_ctor_elements_1): Change p_complete to
> int, to distinguish complete initialization in presence or
> absence of uninitialized padding bits.
> (categorize_ctor_elements): Likewise. Adjust all callers...
> * expr.h (categorize_ctor_elements): ... and declaration.
> (type_has_padding_at_level_p): New.
> * gimple-fold.cc (type_has_padding_at_level_p): New.
> * fold-const.cc (native_encode_constructor): New.
> (native_encode_expr): Call it.
> * gimplify.cc (gimplify_init_constructor): Clear small
> non-addressable non-volatile objects with padding or
> other uninitialized fields as an optimization.
>
> for gcc/testsuite/ChangeLog
>
> * gcc.dg/init-pad-1.c: New.
> ---
> gcc/expr.cc | 20 ++++++++++-----
> gcc/expr.h | 3 +-
> gcc/fold-const.cc | 36 +++++++++++++++++++++++++++
> gcc/gimple-fold.cc | 50 +++++++++++++++++++++++++++++++++++++
> gcc/gimplify.cc | 14 ++++++++++
> gcc/testsuite/gcc.dg/init-pad-1.c | 18 +++++++++++++
> 6 files changed, 132 insertions(+), 9 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 8d17a5a39b4bd..320be8b17a13e 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p)
> static bool
> categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
> HOST_WIDE_INT *p_unique_nz_elts,
> - HOST_WIDE_INT *p_init_elts, bool *p_complete)
> + HOST_WIDE_INT *p_init_elts, int *p_complete)
> {
> unsigned HOST_WIDE_INT idx;
> HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
> @@ -7218,7 +7218,10 @@ categorize_ctor_elements_1 (const_tree ctor,
> HOST_WIDE_INT *p_nz_elts,
>
> if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
> num_fields, elt_type))
> - *p_complete = false;
> + *p_complete = 0;
> + else if (*p_complete > 0
> + && type_has_padding_at_level_p (TREE_TYPE (ctor)))
> + *p_complete = -1;
>
> *p_nz_elts += nz_elts;
> *p_unique_nz_elts += unique_nz_elts;
> @@ -7239,7 +7242,10 @@ categorize_ctor_elements_1 (const_tree ctor,
> HOST_WIDE_INT *p_nz_elts,
> and place it in *P_ELT_COUNT.
> * whether the constructor is complete -- in the sense that every
> meaningful byte is explicitly given a value --
> - and place it in *P_COMPLETE.
> + and place it in *P_COMPLETE:
> + - 0 if any field is missing
> + - 1 if all fields are initialized, and there's no padding
> + - -1 if all fields are initialized, but there's padding
>
> Return whether or not CTOR is a valid static constant initializer, the same
> as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0". */
> @@ -7247,12 +7253,12 @@ categorize_ctor_elements_1 (const_tree ctor,
> HOST_WIDE_INT *p_nz_elts,
> bool
> categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
> HOST_WIDE_INT *p_unique_nz_elts,
> - HOST_WIDE_INT *p_init_elts, bool *p_complete)
> + HOST_WIDE_INT *p_init_elts, int *p_complete)
> {
> *p_nz_elts = 0;
> *p_unique_nz_elts = 0;
> *p_init_elts = 0;
> - *p_complete = true;
> + *p_complete = 1;
>
> return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts,
> p_init_elts, p_complete);
> @@ -7313,7 +7319,7 @@ mostly_zeros_p (const_tree exp)
> if (TREE_CODE (exp) == CONSTRUCTOR)
> {
> HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> - bool complete_p;
> + int complete_p;
>
> categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
> &complete_p);
> @@ -7331,7 +7337,7 @@ all_zeros_p (const_tree exp)
> if (TREE_CODE (exp) == CONSTRUCTOR)
> {
> HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> - bool complete_p;
> + int complete_p;
>
> categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
> &complete_p);
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 533ae0af3871a..04782b15f192c 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -361,7 +361,8 @@ extern unsigned HOST_WIDE_INT highest_pow2_factor
> (const_tree);
>
> extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
> HOST_WIDE_INT *, HOST_WIDE_INT *,
> - bool *);
> + int *);
> +extern bool type_has_padding_at_level_p (tree);
> extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1);
> extern void store_constructor (tree, rtx, int, poly_int64, bool);
> extern HOST_WIDE_INT int_expr_size (const_tree exp);
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index ccc499af66177..81dcc13925a79 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8193,6 +8193,39 @@ native_encode_string (const_tree expr, unsigned char
> *ptr, int len, int off)
> return len;
> }
>
> +/* Subroutine of native_encode_expr. Encode the CONSTRUCTOR
> + specified by EXPR into the buffer PTR of length LEN bytes.
> + Return the number of bytes placed in the buffer, or zero
> + upon failure. */
> +
> +static int
> +native_encode_constructor (const_tree expr, unsigned char *ptr, int len, int
> off)
> +{
> + /* We are only concerned with zero-initialization constructors here.
> That's
> + all we expect to see in GIMPLE, so that's all native_encode_expr should
> + deal with. For more general handling of constructors, there is
> + native_encode_initializer. */
> + if (CONSTRUCTOR_NELTS (expr))
> + return 0;
> +
> + /* Wide-char strings are encoded in target byte-order so native
> + encoding them is trivial. */
> + if (BITS_PER_UNIT != CHAR_BIT
> + || !tree_fits_shwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))))
> + return 0;
> +
> + HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE
> (expr)));
> + if ((off == -1 && total_bytes > len) || off >= total_bytes)
> + return 0;
> + if (off == -1)
> + off = 0;
> + len = MIN (total_bytes - off, len);
> + if (ptr == NULL)
> + /* Dry run. */;
> + else
> + memset (ptr, 0, len);
> + return len;
> +}
>
> /* Subroutine of fold_view_convert_expr. Encode the INTEGER_CST, REAL_CST,
> FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into
> @@ -8229,6 +8262,9 @@ native_encode_expr (const_tree expr, unsigned char
> *ptr, int len, int off)
> case STRING_CST:
> return native_encode_string (expr, ptr, len, off);
>
> + case CONSTRUCTOR:
> + return native_encode_constructor (expr, ptr, len, off);
> +
> default:
> return 0;
> }
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 0bec35d06f66b..2746fcfe314f8 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -4661,6 +4661,56 @@ clear_padding_type_may_have_padding_p (tree type)
> }
> }
>
> +/* Return true if TYPE has padding bits aside from those in fields,
> + elements, etc. */
> +
> +bool
> +type_has_padding_at_level_p (tree type)
> +{
> + switch (TREE_CODE (type))
> + {
> + case RECORD_TYPE:
> + {
> + tree bitpos = size_zero_node;
> + /* Expect fields to be sorted by bit position. */
> + for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> + if (TREE_CODE (f) == FIELD_DECL)
> + {
> + if (DECL_PADDING_P (f))
> + return true;
> + tree pos = bit_position (f);
> + if (simple_cst_equal (bitpos, pos) != 1)
> + return true;
> + if (!DECL_SIZE (f))
> + return true;
> + bitpos = int_const_binop (PLUS_EXPR, pos, DECL_SIZE (f));
> + }
> + if (simple_cst_equal (bitpos, TYPE_SIZE (type)) != 1)
> + return true;
> + return false;
> + }
> + case UNION_TYPE:
> + /* If any of the fields is smaller than the whole, there is padding.
> */
> + for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> + if (TREE_CODE (f) == FIELD_DECL)
> + if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> + TREE_TYPE (type)) != 1)
> + return true;
> + return false;
> + case ARRAY_TYPE:
> + case COMPLEX_TYPE:
> + case VECTOR_TYPE:
> + /* No recursing here, no padding at this level. */
> + return false;
> + case REAL_TYPE:
> + return clear_padding_real_needs_padding_p (type);
> + case BITINT_TYPE:
> + return clear_padding_bitint_needs_padding_p (type);
> + default:
> + return false;
> + }
> +}
> +
> /* Emit a runtime loop:
> for (; buf.base != end; buf.base += sz)
> __builtin_clear_padding (buf.base); */
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 26a216e151d6f..081d69bce0565 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -5564,7 +5564,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
> struct gimplify_init_ctor_preeval_data preeval_data;
> HOST_WIDE_INT num_ctor_elements, num_nonzero_elements;
> HOST_WIDE_INT num_unique_nonzero_elements;
> - bool complete_p, valid_const_initializer;
> + int complete_p;
> + bool valid_const_initializer;
>
> /* Aggregate types must lower constructors to initialization of
> individual elements. The exception is that a CONSTRUCTOR node
> @@ -5668,6 +5669,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
> /* If a single access to the target must be ensured and all elements
> are zero, then it's optimal to clear whatever their number. */
> cleared = true;
> + /* If the object is small enough to go in registers, and it's
> + not required to be constructed in memory, clear it first.
> + That will avoid wasting cycles preserving any padding bits
> + that might be there, and if there aren't any, the compiler
> + is smart enough to optimize the clearing out. */
> + else if (complete_p <= 0
> + && !TREE_ADDRESSABLE (ctor)
> + && !TREE_THIS_VOLATILE (object)
> + && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
> + && optimize)
> + cleared = true;
> else
> cleared = false;
>
> diff --git a/gcc/testsuite/gcc.dg/init-pad-1.c
> b/gcc/testsuite/gcc.dg/init-pad-1.c
> new file mode 100644
> index 0000000000000..801f93813e3ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/init-pad-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Og -fdump-tree-gimple" } */
> +
> +struct s {
> + short a : 3;
> + short b : 3;
> + char c;
> +};
> +
> +extern void g(struct s *);
> +
> +void f() {
> + struct s x = { 0, 0, 1 };
> + g (&x);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "= {};" 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-not "= 0;" "gimple" } } */
>
>
> --
> Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/
> Free Software Activist GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive