Hi,
On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor <[email protected]> wrote:
> > Hi,
> >
> > I'd like to request comments to the patch below which aims to fix PR
> > 80689, which is an instance of a store-to-load forwarding stall on
> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> > slow down of up to 9% compared to gcc 6, depending on options and HW
> > used. (Actually, I have just seen 24% in one specific combination but
> > for various reasons can no longer verify it today.)
> >
> > The revision causing the regression is 237074, which increased the
> > size of the mode for copying aggregates "by pieces" to 128 bits,
> > incurring big stalls when the values being copied are also still being
> > stored in a smaller data type or if the copied values are loaded in a
> > smaller types shortly afterwards. Such situations happen in Image
> > Magick even across calls, which means that any non-IPA flow-sensitive
> > approach would not detect them. Therefore, the patch simply changes
> > the way we copy small BLKmode data that are simple combinations of
> > records and arrays (meaning no unions, bit-fields but also character
> > arrays are disallowed) and simply copies them one field and/or element
> > at a time.
> >
> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
> > (the structure in the benchmark has 32 bytes) but is subject to change
> > after more benchmarking and is actually zero - meaning element copying
> > never happens - on other architectures. I believe that any
> > architecture with a store buffer can benefit but it's probably better
> > to leave it to their maintainers to find a different default value. I
> > am not sure this is how such HW-dependant decisions should be done and
> > is the primary reason, why I am sending this RFC first.
> >
> > I have decided to implement this change at the expansion level because
> > at that point the type information is still readily available and at
> > the same time we can also handle various implicit copies, for example
> > those passing a parameter. I found I could re-use some bits and
> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
> > the process.
> >
> > I am fully aware that in the final patch the new parameter, or indeed
> > any new parameters, need to be documented. I have skipped that
> > intentionally now and will write the documentation if feedback here is
> > generally good.
> >
> > I have bootstrapped and tested this patch on x86_64-linux, with
> > different values of the parameter and only found problems with
> > unreasonably high values leading to OOM. I have done the same with a
> > previous version of the patch which was equivalent to the limit being
> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
> > into failures of tests which assumed that structure padding was copied
> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
> > for example gcc.dg/vmx/varargs-4.c).
> >
> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
> > 8% at -O2 and -Ofast compilation levels respectively on one particular
> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
> >
> > Thanks in advance for any comments,
>
> I wonder if you can at the place you choose to hook this in elide any
> copying of padding between fields.
>
> I'd rather have hooked such "high level" optimization in
> expand_assignment where you can be reasonably sure you're seeing an
> actual source-level construct.
I have discussed this with Honza and we eventually decided to make the
elememnt-wise copy an alternative to emit_block_move (which uses the
larger mode for moving since GCC 7) exactly so that we handle not only
source-level assignments but also passing parameters by value and
other situations.
>
> 35 bytes seems to be much - what is the code-size impact?
I will find out and report on that. I need at least 32 bytes (four
long ints) to fix imagemagick, where the problematic structure is:
typedef struct _RectangleInfo
{
size_t
width,
height;
ssize_t
x,
y;
} RectangleInfo;
...so 8 longs, no padding. Since any aggregate having between 33-35
bytes needs to consist of smaller fields/elements, it seemed
reasonable to also copy them element-wise.
Nevertheless, I still intend to experiment with the limit, I sent out
this RFC exactly so that I don't spend a lot of time benchmarking
something that is eventually not deemed acceptable on principle.
>
> IIRC the reason this may be slow isn't loading in smaller types than stored
> before by the copy - the store buffers can handle this reasonably well. It's
> solely when previous smaller stores are
>
> a1) not mergeabe in the store buffer
> a2) not merged because earlier stores are already committed
>
> and
>
> b) loaded afterwards as a type that would access multiple store buffers
>
> a) would be sure to happen in case b) involves accessing padding. Is the
> Image Magick case one that involves padding in the structure?
As I said above, there is no padding.
Basically, what happens is that in a number of places, there is a
variable region of the aforementioned type and it is initialized and
passed to function SetPixelCacheNexusPixels in the following manner:
...
region.width=cache_info->columns;
region.height=1;
region.x=0;
region.y=y;
pixels=SetPixelCacheNexusPixels(cache_info,ReadMode,®ion,MagickTrue,
cache_nexus[id],exception);
...
and the first four statements in SetPixelCacheNexusPixels are:
assert(cache_info != (const CacheInfo *) NULL);
assert(cache_info->signature == MagickSignature);
if (cache_info->type == UndefinedCache)
return((PixelPacket *) NULL);
nexus_info->region=(*region);
with the last one generating the stalls, on both Zen-based machines
and also on 2-3 years old Intel CPUs.
I have had a look at what Agner Fog's micro-architecture document says
about store forwarding stalls and:
- on Broadwells and Haswells, any "write of any size is followed by
a read of a larger size" ioncurs a stall, which fits our example,
- on Skylakes: "A read that is bigger than the write, or a read that
covers both written and unwritten bytes, takes approximately 11
clock cycles extra" seems to apply
- on Intel silvermont, there will also be a stall because "A memory
write can be forwarded to a subsequent read of the same size or a
smaller size..."
- on Zens, Agner Fog says they work perfectly except when crossing a
page or when "A read that has a partial overlap with a preceding
write has a penalty of 6-7 clock cycles," which must be why I see
stalls.
So I guess the pending stores are not really merged even without
padding,
Martin
>
> Richard.
>
> > Martin
> >
> >
> > 2017-10-12 Martin Jambor <[email protected]>
> >
> > PR target/80689
> > * tree-sra.h: New file.
> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
> > tree-sra.h.
> > * expr.c: Include params.h and tree-sra.h.
> > (emit_move_elementwise): New function.
> > (store_expr_with_bounds): Optionally use it.
> > * ipa-cp.c: Include tree-sra.h.
> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> > * config/i386/i386.c (ix86_option_override_internal): Set
> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> > * tree-sra.c: Include tree-sra.h.
> > (scalarizable_type_p): Renamed to
> > simple_mix_of_records_and_arrays_p, made public, renamed the
> > second parameter to allow_char_arrays.
> > (extract_min_max_idx_from_array): New function.
> > (completely_scalarize): Moved bits of the function to
> > extract_min_max_idx_from_array.
> >
> > testsuite/
> > * gcc.target/i386/pr80689-1.c: New test.
> > ---
> > gcc/config/i386/i386.c | 4 ++
> > gcc/expr.c | 103
> > ++++++++++++++++++++++++++++--
> > gcc/ipa-cp.c | 1 +
> > gcc/ipa-prop.h | 4 --
> > gcc/params.def | 6 ++
> > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++++
> > gcc/tree-sra.c | 86 +++++++++++++++----------
> > gcc/tree-sra.h | 33 ++++++++++
> > 8 files changed, 233 insertions(+), 42 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
> > create mode 100644 gcc/tree-sra.h
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 1ee8351c21f..87f602e7ead 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
> > ix86_tune_cost->l2_cache_size,
> > opts->x_param_values,
> > opts_set->x_param_values);
> > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> > + 35,
> > + opts->x_param_values,
> > + opts_set->x_param_values);
> >
> > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
> > if (opts->x_flag_prefetch_loop_arrays < 0
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index 134ee731c29..dff24e7f166 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see
> > #include "tree-chkp.h"
> > #include "rtl-chkp.h"
> > #include "ccmp.h"
> > -
> > +#include "params.h"
> > +#include "tree-sra.h"
> >
> > /* If this is nonzero, we do not bother generating VOLATILE
> > around volatile memory references, and we are willing to
> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
> > return maybe_expand_insn (code, 2, ops);
> > }
> >
> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to
> > TARGET
> > + plus OFFSET, but do so element-wise and/or field-wise for each record
> > and
> > + array within TYPE. TYPE must either be a register type or an aggregate
> > + complying with scalarizable_type_p.
> > +
> > + If CALL_PARAM_P is nonzero, this is a store into a call param on the
> > + stack, and block moves may need to be treated specially. */
> > +
> > +static void
> > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT
> > offset,
> > + int call_param_p)
> > +{
> > + switch (TREE_CODE (type))
> > + {
> > + case RECORD_TYPE:
> > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> > + if (TREE_CODE (fld) == FIELD_DECL)
> > + {
> > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld);
> > + tree ft = TREE_TYPE (fld);
> > + emit_move_elementwise (ft, target, source, fld_offset,
> > + call_param_p);
> > + }
> > + break;
> > +
> > + case ARRAY_TYPE:
> > + {
> > + tree elem_type = TREE_TYPE (type);
> > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type));
> > + gcc_assert (el_size > 0);
> > +
> > + offset_int idx, max;
> > + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX -
> > 1. */
> > + if (extract_min_max_idx_from_array (type, &idx, &max))
> > + {
> > + HOST_WIDE_INT el_offset = offset;
> > + for (; idx <= max; ++idx)
> > + {
> > + emit_move_elementwise (elem_type, target, source, el_offset,
> > + call_param_p);
> > + el_offset += el_size;
> > + }
> > + }
> > + }
> > + break;
> > + default:
> > + machine_mode mode = TYPE_MODE (type);
> > +
> > + rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT);
> > + rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT);
> > +
> > + /* TODO: Figure out whether the following is actually necessary. */
> > + if (target == ntgt)
> > + ntgt = copy_rtx (target);
> > + if (source == nsrc)
> > + nsrc = copy_rtx (source);
> > +
> > + gcc_assert (mode != VOIDmode);
> > + if (mode != BLKmode)
> > + emit_move_insn (ntgt, nsrc);
> > + else
> > + {
> > + /* For example vector gimple registers can end up here. */
> > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX,
> > + TYPE_MODE (sizetype), EXPAND_NORMAL);
> > + emit_block_move (ntgt, nsrc, size,
> > + (call_param_p
> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > + }
> > + break;
> > + }
> > + return;
> > +}
> > +
> > /* Generate code for computing expression EXP,
> > and storing the value into TARGET.
> >
> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, int
> > call_param_p,
> > emit_group_store (target, temp, TREE_TYPE (exp),
> > int_size_in_bytes (TREE_TYPE (exp)));
> > else if (GET_MODE (temp) == BLKmode)
> > - emit_block_move (target, temp, expr_size (exp),
> > - (call_param_p
> > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > + {
> > + /* Copying smallish BLKmode structures with emit_block_move and
> > thus
> > + by-pieces can result in store-to-load stalls. So copy some
> > simple
> > + small aggregates element or field-wise. */
> > + if (GET_MODE (target) == BLKmode
> > + && AGGREGATE_TYPE_P (TREE_TYPE (exp))
> > + && !TREE_ADDRESSABLE (TREE_TYPE (exp))
> > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp)))
> > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
> > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY)
> > + * BITS_PER_UNIT))
> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp),
> > false))
> > + {
> > + /* FIXME: Can this happen? What would it mean? */
> > + gcc_assert (!reverse);
> > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0,
> > + call_param_p);
> > + }
> > + else
> > + emit_block_move (target, temp, expr_size (exp),
> > + (call_param_p
> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > + }
> > /* If we emit a nontemporal store, there is nothing else to do. */
> > else if (nontemporal && emit_storent_insn (target, temp))
> > ;
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index 6b3d8d7364c..7d6019bbd30 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see
> > #include "tree-ssa-ccp.h"
> > #include "stringpool.h"
> > #include "attribs.h"
> > +#include "tree-sra.h"
> >
> > template <typename valtype> class ipcp_value;
> >
> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> > index fa5bed49ee0..2313cc884ed 100644
> > --- a/gcc/ipa-prop.h
> > +++ b/gcc/ipa-prop.h
> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate
> > (tree **, bool *,
> > void ipa_release_body_info (struct ipa_func_body_info *);
> > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
> >
> > -/* From tree-sra.c: */
> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> > - gimple_stmt_iterator *, bool);
> > -
> > /* In ipa-cp.c */
> > void ipa_cp_c_finalize (void);
> >
> > diff --git a/gcc/params.def b/gcc/params.def
> > index e55afc28053..5e19f1414a0 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
> > "Enable loop epilogue vectorization using smaller vector size.",
> > 0, 0, 1)
> >
> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> > + "max-size-for-elementwise-copy",
> > + "Maximum size in bytes of a structure or array to by considered
> > for "
> > + "copying by its individual fields or elements",
> > + 0, 0, 512)
> > +
> > /*
> >
> > Local variables:
> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c
> > b/gcc/testsuite/gcc.target/i386/pr80689-1.c
> > new file mode 100644
> > index 00000000000..4156d4fba45
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +typedef struct st1
> > +{
> > + long unsigned int a,b;
> > + long int c,d;
> > +}R;
> > +
> > +typedef struct st2
> > +{
> > + int t;
> > + R reg;
> > +}N;
> > +
> > +void Set (const R *region, N *n_info );
> > +
> > +void test(N *n_obj ,const long unsigned int a, const long unsigned int b,
> > const long int c,const long int d)
> > +{
> > + R reg;
> > +
> > + reg.a=a;
> > + reg.b=b;
> > + reg.c=c;
> > + reg.d=d;
> > + Set (®, n_obj);
> > +
> > +}
> > +
> > +void Set (const R *reg, N *n_obj )
> > +{
> > + n_obj->reg=(*reg);
> > +}
> > +
> > +
> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */
> > +/* { dg-final { scan-assembler-not "movdqu" } } */
> > +/* { dg-final { scan-assembler-not "movups" } } */
> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> > index bac593951e7..ade97964205 100644
> > --- a/gcc/tree-sra.c
> > +++ b/gcc/tree-sra.c
> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see
> > #include "ipa-fnsummary.h"
> > #include "ipa-utils.h"
> > #include "builtins.h"
> > +#include "tree-sra.h"
> >
> > /* Enumeration of all aggregate reductions we can do. */
> > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */
> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool write)
> > }
> >
> >
> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
> > fixed-length
> > - ARRAY_TYPE with fields that are either of gimple register types
> > (excluding
> > - bit-fields) or (recursively) scalarizable types. CONST_DECL must be
> > true if
> > - we are considering a decl from constant pool. If it is false, char
> > arrays
> > - will be refused. */
> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE
> > with
> > + fields/elements that are not bit-fields and are either register types or
> > + recursively comply with simple_mix_of_records_and_arrays_p.
> > Furthermore, if
> > + ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE
> > + contains an array of elements that only have one byte. */
> >
> > -static bool
> > -scalarizable_type_p (tree type, bool const_decl)
> > +bool
> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays)
> > {
> > gcc_assert (!is_gimple_reg_type (type));
> > if (type_contains_placeholder_p (type))
> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl)
> > return false;
> >
> > if (!is_gimple_reg_type (ft)
> > - && !scalarizable_type_p (ft, const_decl))
> > + && !simple_mix_of_records_and_arrays_p (ft,
> > allow_char_arrays))
> > return false;
> > }
> >
> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl)
> > case ARRAY_TYPE:
> > {
> > HOST_WIDE_INT min_elem_size;
> > - if (const_decl)
> > + if (allow_char_arrays)
> > min_elem_size = 0;
> > else
> > min_elem_size = BITS_PER_UNIT;
> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl)
> >
> > tree elem = TREE_TYPE (type);
> > if (!is_gimple_reg_type (elem)
> > - && !scalarizable_type_p (elem, const_decl))
> > + && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays))
> > return false;
> > return true;
> > }
> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool const_decl)
> > }
> > }
> >
> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool,
> > tree, tree);
> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree,
> > + tree);
> > +
> > +/* For a given array TYPE, return false if its domain does not have any
> > maximum
> > + value. Otherwise calculate MIN and MAX indices of the first and the
> > last
> > + element. */
> > +
> > +bool
> > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int
> > *max)
> > +{
> > + tree domain = TYPE_DOMAIN (type);
> > + tree minidx = TYPE_MIN_VALUE (domain);
> > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> > + tree maxidx = TYPE_MAX_VALUE (domain);
> > + if (!maxidx)
> > + return false;
> > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> > +
> > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */
> > + *min = wi::to_offset (minidx);
> > + *max = wi::to_offset (maxidx);
> > + if (!TYPE_UNSIGNED (domain))
> > + {
> > + *min = wi::sext (*min, TYPE_PRECISION (domain));
> > + *max = wi::sext (*max, TYPE_PRECISION (domain));
> > + }
> > + return true;
> > +}
> >
> > /* Create total_scalarization accesses for all scalar fields of a member
> > - of type DECL_TYPE conforming to scalarizable_type_p. BASE
> > + of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p.
> > BASE
> > must be the top-most VAR_DECL representing the variable; within that,
> > OFFSET locates the member and REF must be the memory reference
> > expression for
> > the member. */
> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree decl_type,
> > HOST_WIDE_INT offset, tree ref)
> > {
> > tree elemtype = TREE_TYPE (decl_type);
> > tree elem_size = TYPE_SIZE (elemtype);
> > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
> > HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
> > gcc_assert (el_size > 0);
> >
> > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
> > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
> > + offset_int idx, max;
> > /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX -
> > 1. */
> > - if (maxidx)
> > + if (extract_min_max_idx_from_array (decl_type, &idx, &max))
> > {
> > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> > tree domain = TYPE_DOMAIN (decl_type);
> > - /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> > - DOMAIN (e.g. signed int, whereas min/max may be size_int).
> > */
> > - offset_int idx = wi::to_offset (minidx);
> > - offset_int max = wi::to_offset (maxidx);
> > - if (!TYPE_UNSIGNED (domain))
> > - {
> > - idx = wi::sext (idx, TYPE_PRECISION (domain));
> > - max = wi::sext (max, TYPE_PRECISION (domain));
> > - }
> > for (int el_off = offset; idx <= max; ++idx)
> > {
> > tree nref = build4 (ARRAY_REF, elemtype,
> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree decl_type,
> > HOST_WIDE_INT offset, tree ref)
> > }
> >
> > /* Create total_scalarization accesses for a member of type TYPE, which
> > must
> > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be
> > the
> > - top-most VAR_DECL representing the variable; within that, POS and SIZE
> > locate
> > - the member, REVERSE gives its torage order. and REF must be the
> > reference
> > - expression for it. */
> > + satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p.
> > + BASE must be the top-most VAR_DECL representing the variable; within
> > that,
> > + POS and SIZE locate the member, REVERSE gives its torage order. and REF
> > must
> > + be the reference expression for it. */
> >
> > static void
> > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool
> > reverse,
> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos,
> > HOST_WIDE_INT size, bool reverse,
> > }
> >
> > /* Create a total_scalarization access for VAR as a whole. VAR must be of
> > a
> > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */
> > + RECORD_TYPE or ARRAY_TYPE conforming to
> > + simple_mix_of_records_and_arrays_p. */
> >
> > static void
> > create_total_scalarization_access (tree var)
> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void)
> > {
> > tree var = candidate (i);
> >
> > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> > - constant_decl_p (var)))
> > + if (VAR_P (var)
> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var),
> > + constant_decl_p (var)))
> > {
> > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > <= max_scalarization_size)
> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> > new file mode 100644
> > index 00000000000..dc901385994
> > --- /dev/null
> > +++ b/gcc/tree-sra.h
> > @@ -0,0 +1,33 @@
> > +/* tree-sra.h - Run-time parameters.
> > + Copyright (C) 2017 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3. If not see
> > +<http://www.gnu.org/licenses/>. */
> > +
> > +#ifndef TREE_SRA_H
> > +#define TREE_SRA_H
> > +
> > +
> > +bool simple_mix_of_records_and_arrays_p (tree type, bool
> > allow_char_arrays);
> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx,
> > + offset_int *max);
> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
> > + bool reverse, tree exp_type,
> > + gimple_stmt_iterator *gsi, bool insert_after);
> > +
> > +
> > +
> > +#endif /* TREE_SRA_H */
> > --
> > 2.14.1
> >