On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener
<[email protected]> wrote:
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka <[email protected]> wrote:
>>> I think the limit should be on the number of generated copies and not
>>> the overall size of the structure... If the struct were composed of
>>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>>
>>> I wonder how rep; movb; interacts with store to load forwarding? Is
>>> that maybe optimized well on some archs? movb should always
>>> forward and wasn't the setup cost for small N reasonable on modern
>>> CPUs?
>>
>> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
>> 24-128 on zen. This is w/o store/load forwarding, but I doubt those provide
>> a cheap way around.
>>
>>>
>>> It probably depends on the width of the entries in the store buffer,
>>> if they appear in-order and the alignment of the stores (if they are larger
>>> than
>>> 8 bytes they are surely aligned). IIRC CPUs had smaller store buffer
>>> entries than cache line size.
>>>
>>> Given that load bandwith is usually higher than store bandwith it
>>> might make sense to do the store combining in our copying sequence,
>>> like for the 8 byte entry case use sth like
>>>
>>> movq 0(%eax), %xmm0
>>> movhps 8(%eax), %xmm0 // or vpinsert
>>> mov[au]ps %xmm0, 0%(ebx)
>>> ...
>>>
>>> thus do two loads per store and perform the stores in wider
>>> mode?
>>
>> This may be somewhat faster indeed. I am not sure if store to load
>> forwarding will work for the later half when read again by halves.
>> It would not happen on older CPUs :)
>
> Yes, forwarding larger stores to smaller loads generally works fine
> since forever with the usual restrictions of alignment/size being
> power of two "halves".
>
> The question is of course what to do for 4 byte or smaller elements or
> mixed size elements. We can do zero-extending loads
> (do we have them for QI, HI mode loads as well?) and
> do shift and or's. I'm quite sure the CPUs wouldn't like to
> see vpinsert's of different vector mode destinations. So it
> would be 8 byte stores from GPRs and values built up via
> shift & or.
Like we generate
foo:
.LFB0:
.cfi_startproc
movl 4(%rdi), %eax
movzwl 2(%rdi), %edx
salq $16, %rax
orq %rdx, %rax
movzbl 1(%rdi), %edx
salq $8, %rax
orq %rdx, %rax
movzbl (%rdi), %edx
salq $8, %rax
orq %rdx, %rax
movq %rax, (%rsi)
ret
for
struct x { char e; char f; short c; int i; } a;
void foo (struct x *p, long *q)
{
*q = (((((((unsigned long)(unsigned int)p->i) << 16)
| (((unsigned long)(unsigned short)p->c))) << 8)
| (((unsigned long)(unsigned char)p->f))) << 8)
| ((unsigned long)(unsigned char)p->e);
}
if you disable the bswap pass. Doing 4 byte stores in this
case would save some prefixes at least. I expected the
ORs and shifts to have smaller encodings...
With 4 byte stores we end up with the same size as with
individual loads & stores.
> As said, the important part is that IIRC CPUs can usually
> have more loads in flight than stores. Esp. Bulldozer
> with the split core was store buffer size limited (but it
> could do merging of store buffer entries IIRC).
Also if we do the stores in smaller chunks we are more
likely hitting the same store-to-load-forwarding issue
elsewhere. Like in case the destination is memcpy'ed
away.
So the proposed change isn't necessarily a win without
a possible similar regression that it tries to fix.
Whole-program analysis of accesses might allow
marking affected objects.
Richard.
> Richard.
>
>> Honza
>>>
>>> As said a general concern was you not copying padding. If you
>>> put this into an even more common place you surely will break
>>> stuff, no?
>>>
>>> Richard.
>>>
>>> >
>>> > Martin
>>> >
>>> >
>>> >>
>>> >> Richard.
>>> >>
>>> >> > Martin
>>> >> >
>>> >> >
>>> >> > 2017-10-12 Martin Jambor <[email protected]>
>>> >> >
>>> >> > PR target/80689
>>> >> > * tree-sra.h: New file.
>>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
>>> >> > tree-sra.h.
>>> >> > * expr.c: Include params.h and tree-sra.h.
>>> >> > (emit_move_elementwise): New function.
>>> >> > (store_expr_with_bounds): Optionally use it.
>>> >> > * ipa-cp.c: Include tree-sra.h.
>>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>>> >> > * config/i386/i386.c (ix86_option_override_internal): Set
>>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>>> >> > * tree-sra.c: Include tree-sra.h.
>>> >> > (scalarizable_type_p): Renamed to
>>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
>>> >> > second parameter to allow_char_arrays.
>>> >> > (extract_min_max_idx_from_array): New function.
>>> >> > (completely_scalarize): Moved bits of the function to
>>> >> > extract_min_max_idx_from_array.
>>> >> >
>>> >> > testsuite/
>>> >> > * gcc.target/i386/pr80689-1.c: New test.
>>> >> > ---
>>> >> > gcc/config/i386/i386.c | 4 ++
>>> >> > gcc/expr.c | 103
>>> >> > ++++++++++++++++++++++++++++--
>>> >> > gcc/ipa-cp.c | 1 +
>>> >> > gcc/ipa-prop.h | 4 --
>>> >> > gcc/params.def | 6 ++
>>> >> > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++++
>>> >> > gcc/tree-sra.c | 86
>>> >> > +++++++++++++++----------
>>> >> > gcc/tree-sra.h | 33 ++++++++++
>>> >> > 8 files changed, 233 insertions(+), 42 deletions(-)
>>> >> > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > create mode 100644 gcc/tree-sra.h
>>> >> >
>>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> >> > index 1ee8351c21f..87f602e7ead 100644
>>> >> > --- a/gcc/config/i386/i386.c
>>> >> > +++ b/gcc/config/i386/i386.c
>>> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
>>> >> > ix86_tune_cost->l2_cache_size,
>>> >> > opts->x_param_values,
>>> >> > opts_set->x_param_values);
>>> >> > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>>> >> > + 35,
>>> >> > + opts->x_param_values,
>>> >> > + opts_set->x_param_values);
>>> >> >
>>> >> > /* Enable sw prefetching at -O3 for CPUS that prefetching is
>>> >> > helpful. */
>>> >> > if (opts->x_flag_prefetch_loop_arrays < 0
>>> >> > diff --git a/gcc/expr.c b/gcc/expr.c
>>> >> > index 134ee731c29..dff24e7f166 100644
>>> >> > --- a/gcc/expr.c
>>> >> > +++ b/gcc/expr.c
>>> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see
>>> >> > #include "tree-chkp.h"
>>> >> > #include "rtl-chkp.h"
>>> >> > #include "ccmp.h"
>>> >> > -
>>> >> > +#include "params.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> > /* If this is nonzero, we do not bother generating VOLATILE
>>> >> > around volatile memory references, and we are willing to
>>> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
>>> >> > return maybe_expand_insn (code, 2, ops);
>>> >> > }
>>> >> >
>>> >> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET
>>> >> > to TARGET
>>> >> > + plus OFFSET, but do so element-wise and/or field-wise for each
>>> >> > record and
>>> >> > + array within TYPE. TYPE must either be a register type or an
>>> >> > aggregate
>>> >> > + complying with scalarizable_type_p.
>>> >> > +
>>> >> > + If CALL_PARAM_P is nonzero, this is a store into a call param on
>>> >> > the
>>> >> > + stack, and block moves may need to be treated specially. */
>>> >> > +
>>> >> > +static void
>>> >> > +emit_move_elementwise (tree type, rtx target, rtx source,
>>> >> > HOST_WIDE_INT offset,
>>> >> > + int call_param_p)
>>> >> > +{
>>> >> > + switch (TREE_CODE (type))
>>> >> > + {
>>> >> > + case RECORD_TYPE:
>>> >> > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
>>> >> > + if (TREE_CODE (fld) == FIELD_DECL)
>>> >> > + {
>>> >> > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld);
>>> >> > + tree ft = TREE_TYPE (fld);
>>> >> > + emit_move_elementwise (ft, target, source, fld_offset,
>>> >> > + call_param_p);
>>> >> > + }
>>> >> > + break;
>>> >> > +
>>> >> > + case ARRAY_TYPE:
>>> >> > + {
>>> >> > + tree elem_type = TREE_TYPE (type);
>>> >> > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type));
>>> >> > + gcc_assert (el_size > 0);
>>> >> > +
>>> >> > + offset_int idx, max;
>>> >> > + /* Skip (some) zero-length arrays; others have MAXIDX ==
>>> >> > MINIDX - 1. */
>>> >> > + if (extract_min_max_idx_from_array (type, &idx, &max))
>>> >> > + {
>>> >> > + HOST_WIDE_INT el_offset = offset;
>>> >> > + for (; idx <= max; ++idx)
>>> >> > + {
>>> >> > + emit_move_elementwise (elem_type, target, source,
>>> >> > el_offset,
>>> >> > + call_param_p);
>>> >> > + el_offset += el_size;
>>> >> > + }
>>> >> > + }
>>> >> > + }
>>> >> > + break;
>>> >> > + default:
>>> >> > + machine_mode mode = TYPE_MODE (type);
>>> >> > +
>>> >> > + rtx ntgt = adjust_address (target, mode, offset /
>>> >> > BITS_PER_UNIT);
>>> >> > + rtx nsrc = adjust_address (source, mode, offset /
>>> >> > BITS_PER_UNIT);
>>> >> > +
>>> >> > + /* TODO: Figure out whether the following is actually
>>> >> > necessary. */
>>> >> > + if (target == ntgt)
>>> >> > + ntgt = copy_rtx (target);
>>> >> > + if (source == nsrc)
>>> >> > + nsrc = copy_rtx (source);
>>> >> > +
>>> >> > + gcc_assert (mode != VOIDmode);
>>> >> > + if (mode != BLKmode)
>>> >> > + emit_move_insn (ntgt, nsrc);
>>> >> > + else
>>> >> > + {
>>> >> > + /* For example vector gimple registers can end up here. */
>>> >> > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX,
>>> >> > + TYPE_MODE (sizetype), EXPAND_NORMAL);
>>> >> > + emit_block_move (ntgt, nsrc, size,
>>> >> > + (call_param_p
>>> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > + }
>>> >> > + break;
>>> >> > + }
>>> >> > + return;
>>> >> > +}
>>> >> > +
>>> >> > /* Generate code for computing expression EXP,
>>> >> > and storing the value into TARGET.
>>> >> >
>>> >> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target,
>>> >> > int call_param_p,
>>> >> > emit_group_store (target, temp, TREE_TYPE (exp),
>>> >> > int_size_in_bytes (TREE_TYPE (exp)));
>>> >> > else if (GET_MODE (temp) == BLKmode)
>>> >> > - emit_block_move (target, temp, expr_size (exp),
>>> >> > - (call_param_p
>>> >> > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > + {
>>> >> > + /* Copying smallish BLKmode structures with emit_block_move
>>> >> > and thus
>>> >> > + by-pieces can result in store-to-load stalls. So copy
>>> >> > some simple
>>> >> > + small aggregates element or field-wise. */
>>> >> > + if (GET_MODE (target) == BLKmode
>>> >> > + && AGGREGATE_TYPE_P (TREE_TYPE (exp))
>>> >> > + && !TREE_ADDRESSABLE (TREE_TYPE (exp))
>>> >> > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp)))
>>> >> > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
>>> >> > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY)
>>> >> > + * BITS_PER_UNIT))
>>> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp),
>>> >> > false))
>>> >> > + {
>>> >> > + /* FIXME: Can this happen? What would it mean? */
>>> >> > + gcc_assert (!reverse);
>>> >> > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0,
>>> >> > + call_param_p);
>>> >> > + }
>>> >> > + else
>>> >> > + emit_block_move (target, temp, expr_size (exp),
>>> >> > + (call_param_p
>>> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > + }
>>> >> > /* If we emit a nontemporal store, there is nothing else to do.
>>> >> > */
>>> >> > else if (nontemporal && emit_storent_insn (target, temp))
>>> >> > ;
>>> >> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>>> >> > index 6b3d8d7364c..7d6019bbd30 100644
>>> >> > --- a/gcc/ipa-cp.c
>>> >> > +++ b/gcc/ipa-cp.c
>>> >> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see
>>> >> > #include "tree-ssa-ccp.h"
>>> >> > #include "stringpool.h"
>>> >> > #include "attribs.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> > template <typename valtype> class ipcp_value;
>>> >> >
>>> >> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>>> >> > index fa5bed49ee0..2313cc884ed 100644
>>> >> > --- a/gcc/ipa-prop.h
>>> >> > +++ b/gcc/ipa-prop.h
>>> >> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate
>>> >> > (tree **, bool *,
>>> >> > void ipa_release_body_info (struct ipa_func_body_info *);
>>> >> > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>>> >> >
>>> >> > -/* From tree-sra.c: */
>>> >> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool,
>>> >> > tree,
>>> >> > - gimple_stmt_iterator *, bool);
>>> >> > -
>>> >> > /* In ipa-cp.c */
>>> >> > void ipa_cp_c_finalize (void);
>>> >> >
>>> >> > diff --git a/gcc/params.def b/gcc/params.def
>>> >> > index e55afc28053..5e19f1414a0 100644
>>> >> > --- a/gcc/params.def
>>> >> > +++ b/gcc/params.def
>>> >> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
>>> >> > "Enable loop epilogue vectorization using smaller vector
>>> >> > size.",
>>> >> > 0, 0, 1)
>>> >> >
>>> >> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>>> >> > + "max-size-for-elementwise-copy",
>>> >> > + "Maximum size in bytes of a structure or array to by
>>> >> > considered for "
>>> >> > + "copying by its individual fields or elements",
>>> >> > + 0, 0, 512)
>>> >> > +
>>> >> > /*
>>> >> >
>>> >> > Local variables:
>>> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > new file mode 100644
>>> >> > index 00000000000..4156d4fba45
>>> >> > --- /dev/null
>>> >> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > @@ -0,0 +1,38 @@
>>> >> > +/* { dg-do compile } */
>>> >> > +/* { dg-options "-O2" } */
>>> >> > +
>>> >> > +typedef struct st1
>>> >> > +{
>>> >> > + long unsigned int a,b;
>>> >> > + long int c,d;
>>> >> > +}R;
>>> >> > +
>>> >> > +typedef struct st2
>>> >> > +{
>>> >> > + int t;
>>> >> > + R reg;
>>> >> > +}N;
>>> >> > +
>>> >> > +void Set (const R *region, N *n_info );
>>> >> > +
>>> >> > +void test(N *n_obj ,const long unsigned int a, const long unsigned
>>> >> > int b, const long int c,const long int d)
>>> >> > +{
>>> >> > + R reg;
>>> >> > +
>>> >> > + reg.a=a;
>>> >> > + reg.b=b;
>>> >> > + reg.c=c;
>>> >> > + reg.d=d;
>>> >> > + Set (®, n_obj);
>>> >> > +
>>> >> > +}
>>> >> > +
>>> >> > +void Set (const R *reg, N *n_obj )
>>> >> > +{
>>> >> > + n_obj->reg=(*reg);
>>> >> > +}
>>> >> > +
>>> >> > +
>>> >> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */
>>> >> > +/* { dg-final { scan-assembler-not "movdqu" } } */
>>> >> > +/* { dg-final { scan-assembler-not "movups" } } */
>>> >> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>>> >> > index bac593951e7..ade97964205 100644
>>> >> > --- a/gcc/tree-sra.c
>>> >> > +++ b/gcc/tree-sra.c
>>> >> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see
>>> >> > #include "ipa-fnsummary.h"
>>> >> > #include "ipa-utils.h"
>>> >> > #include "builtins.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> > /* Enumeration of all aggregate reductions we can do. */
>>> >> > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */
>>> >> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool
>>> >> > write)
>>> >> > }
>>> >> >
>>> >> >
>>> >> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
>>> >> > fixed-length
>>> >> > - ARRAY_TYPE with fields that are either of gimple register types
>>> >> > (excluding
>>> >> > - bit-fields) or (recursively) scalarizable types. CONST_DECL must
>>> >> > be true if
>>> >> > - we are considering a decl from constant pool. If it is false,
>>> >> > char arrays
>>> >> > - will be refused. */
>>> >> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length
>>> >> > ARRAY_TYPE with
>>> >> > + fields/elements that are not bit-fields and are either register
>>> >> > types or
>>> >> > + recursively comply with simple_mix_of_records_and_arrays_p.
>>> >> > Furthermore, if
>>> >> > + ALLOW_CHAR_ARRAYS is false, the function will return false also if
>>> >> > TYPE
>>> >> > + contains an array of elements that only have one byte. */
>>> >> >
>>> >> > -static bool
>>> >> > -scalarizable_type_p (tree type, bool const_decl)
>>> >> > +bool
>>> >> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays)
>>> >> > {
>>> >> > gcc_assert (!is_gimple_reg_type (type));
>>> >> > if (type_contains_placeholder_p (type))
>>> >> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> > return false;
>>> >> >
>>> >> > if (!is_gimple_reg_type (ft)
>>> >> > - && !scalarizable_type_p (ft, const_decl))
>>> >> > + && !simple_mix_of_records_and_arrays_p (ft,
>>> >> > allow_char_arrays))
>>> >> > return false;
>>> >> > }
>>> >> >
>>> >> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> > case ARRAY_TYPE:
>>> >> > {
>>> >> > HOST_WIDE_INT min_elem_size;
>>> >> > - if (const_decl)
>>> >> > + if (allow_char_arrays)
>>> >> > min_elem_size = 0;
>>> >> > else
>>> >> > min_elem_size = BITS_PER_UNIT;
>>> >> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> >
>>> >> > tree elem = TREE_TYPE (type);
>>> >> > if (!is_gimple_reg_type (elem)
>>> >> > - && !scalarizable_type_p (elem, const_decl))
>>> >> > + && !simple_mix_of_records_and_arrays_p (elem,
>>> >> > allow_char_arrays))
>>> >> > return false;
>>> >> > return true;
>>> >> > }
>>> >> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool
>>> >> > const_decl)
>>> >> > }
>>> >> > }
>>> >> >
>>> >> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool,
>>> >> > tree, tree);
>>> >> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool,
>>> >> > tree,
>>> >> > + tree);
>>> >> > +
>>> >> > +/* For a given array TYPE, return false if its domain does not have
>>> >> > any maximum
>>> >> > + value. Otherwise calculate MIN and MAX indices of the first and
>>> >> > the last
>>> >> > + element. */
>>> >> > +
>>> >> > +bool
>>> >> > +extract_min_max_idx_from_array (tree type, offset_int *min,
>>> >> > offset_int *max)
>>> >> > +{
>>> >> > + tree domain = TYPE_DOMAIN (type);
>>> >> > + tree minidx = TYPE_MIN_VALUE (domain);
>>> >> > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>>> >> > + tree maxidx = TYPE_MAX_VALUE (domain);
>>> >> > + if (!maxidx)
>>> >> > + return false;
>>> >> > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>>> >> > +
>>> >> > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in
>>> >> > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */
>>> >> > + *min = wi::to_offset (minidx);
>>> >> > + *max = wi::to_offset (maxidx);
>>> >> > + if (!TYPE_UNSIGNED (domain))
>>> >> > + {
>>> >> > + *min = wi::sext (*min, TYPE_PRECISION (domain));
>>> >> > + *max = wi::sext (*max, TYPE_PRECISION (domain));
>>> >> > + }
>>> >> > + return true;
>>> >> > +}
>>> >> >
>>> >> > /* Create total_scalarization accesses for all scalar fields of a
>>> >> > member
>>> >> > - of type DECL_TYPE conforming to scalarizable_type_p. BASE
>>> >> > + of type DECL_TYPE conforming to
>>> >> > simple_mix_of_records_and_arrays_p. BASE
>>> >> > must be the top-most VAR_DECL representing the variable; within
>>> >> > that,
>>> >> > OFFSET locates the member and REF must be the memory reference
>>> >> > expression for
>>> >> > the member. */
>>> >> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree
>>> >> > decl_type, HOST_WIDE_INT offset, tree ref)
>>> >> > {
>>> >> > tree elemtype = TREE_TYPE (decl_type);
>>> >> > tree elem_size = TYPE_SIZE (elemtype);
>>> >> > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
>>> >> > HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
>>> >> > gcc_assert (el_size > 0);
>>> >> >
>>> >> > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
>>> >> > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>>> >> > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
>>> >> > + offset_int idx, max;
>>> >> > /* Skip (some) zero-length arrays; others have MAXIDX ==
>>> >> > MINIDX - 1. */
>>> >> > - if (maxidx)
>>> >> > + if (extract_min_max_idx_from_array (decl_type, &idx, &max))
>>> >> > {
>>> >> > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>>> >> > tree domain = TYPE_DOMAIN (decl_type);
>>> >> > - /* MINIDX and MAXIDX are inclusive, and must be
>>> >> > interpreted in
>>> >> > - DOMAIN (e.g. signed int, whereas min/max may be
>>> >> > size_int). */
>>> >> > - offset_int idx = wi::to_offset (minidx);
>>> >> > - offset_int max = wi::to_offset (maxidx);
>>> >> > - if (!TYPE_UNSIGNED (domain))
>>> >> > - {
>>> >> > - idx = wi::sext (idx, TYPE_PRECISION (domain));
>>> >> > - max = wi::sext (max, TYPE_PRECISION (domain));
>>> >> > - }
>>> >> > for (int el_off = offset; idx <= max; ++idx)
>>> >> > {
>>> >> > tree nref = build4 (ARRAY_REF, elemtype,
>>> >> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree
>>> >> > decl_type, HOST_WIDE_INT offset, tree ref)
>>> >> > }
>>> >> >
>>> >> > /* Create total_scalarization accesses for a member of type TYPE,
>>> >> > which must
>>> >> > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE
>>> >> > must be the
>>> >> > - top-most VAR_DECL representing the variable; within that, POS and
>>> >> > SIZE locate
>>> >> > - the member, REVERSE gives its torage order. and REF must be the
>>> >> > reference
>>> >> > - expression for it. */
>>> >> > + satisfy either is_gimple_reg_type or
>>> >> > simple_mix_of_records_and_arrays_p.
>>> >> > + BASE must be the top-most VAR_DECL representing the variable;
>>> >> > within that,
>>> >> > + POS and SIZE locate the member, REVERSE gives its torage order.
>>> >> > and REF must
>>> >> > + be the reference expression for it. */
>>> >> >
>>> >> > static void
>>> >> > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>>> >> > bool reverse,
>>> >> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos,
>>> >> > HOST_WIDE_INT size, bool reverse,
>>> >> > }
>>> >> >
>>> >> > /* Create a total_scalarization access for VAR as a whole. VAR must
>>> >> > be of a
>>> >> > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */
>>> >> > + RECORD_TYPE or ARRAY_TYPE conforming to
>>> >> > + simple_mix_of_records_and_arrays_p. */
>>> >> >
>>> >> > static void
>>> >> > create_total_scalarization_access (tree var)
>>> >> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void)
>>> >> > {
>>> >> > tree var = candidate (i);
>>> >> >
>>> >> > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
>>> >> > - constant_decl_p (var)))
>>> >> > + if (VAR_P (var)
>>> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var),
>>> >> > + constant_decl_p
>>> >> > (var)))
>>> >> > {
>>> >> > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>>> >> > <= max_scalarization_size)
>>> >> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
>>> >> > new file mode 100644
>>> >> > index 00000000000..dc901385994
>>> >> > --- /dev/null
>>> >> > +++ b/gcc/tree-sra.h
>>> >> > @@ -0,0 +1,33 @@
>>> >> > +/* tree-sra.h - Run-time parameters.
>>> >> > + Copyright (C) 2017 Free Software Foundation, Inc.
>>> >> > +
>>> >> > +This file is part of GCC.
>>> >> > +
>>> >> > +GCC is free software; you can redistribute it and/or modify it under
>>> >> > +the terms of the GNU General Public License as published by the Free
>>> >> > +Software Foundation; either version 3, or (at your option) any later
>>> >> > +version.
>>> >> > +
>>> >> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> >> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> >> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>> >> > +for more details.
>>> >> > +
>>> >> > +You should have received a copy of the GNU General Public License
>>> >> > +along with GCC; see the file COPYING3. If not see
>>> >> > +<http://www.gnu.org/licenses/>. */
>>> >> > +
>>> >> > +#ifndef TREE_SRA_H
>>> >> > +#define TREE_SRA_H
>>> >> > +
>>> >> > +
>>> >> > +bool simple_mix_of_records_and_arrays_p (tree type, bool
>>> >> > allow_char_arrays);
>>> >> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx,
>>> >> > + offset_int *max);
>>> >> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT
>>> >> > offset,
>>> >> > + bool reverse, tree exp_type,
>>> >> > + gimple_stmt_iterator *gsi, bool
>>> >> > insert_after);
>>> >> > +
>>> >> > +
>>> >> > +
>>> >> > +#endif /* TREE_SRA_H */
>>> >> > --
>>> >> > 2.14.1
>>> >> >