Hi, On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote: > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjam...@suse.cz> wrote: > >Hi, > > > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: > >> On Wed, 12 Apr 2017, Martin Jambor wrote: > >> > >> > Hi, > >> > > >> > the patch below is an attempt to deal with PR 80293 as > >non-invasively > >> > as possible. Basically, it switches off total SRA scalarization of > >> > any local aggregates which contains an array of elements that have > >one > >> > byte (or less). > >> > > >> > The logic behind this is that accessing such arrays element-wise > >> > usually results in poor code and that such char arrays are often > >used > >> > for non-statically-typed content anyway, and we do not want to copy > >> > that byte per byte. > >> > > >> > Alan, do you think this could impact your constant pool > >scalarization > >> > too severely? > >> > >> Hmm, isn't one of the issues that we have > >> > >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > >> { > >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > >> <= max_scalarization_size) > >> { > >> create_total_scalarization_access (var); > >> > >> which limits the size of scalarizable vars but not the number > >> of accesses we create for total scalarization? > > > >Well, yes, but at least I understood from your comment #4 in the bug > >that you did not want to limit the number of replacements for gcc 7? > > > >> > >> Is scalarizable_type_p only used in contexts where we have no hint > >> of the actual accesses? > > > >There are should_scalarize_away_bitmap and > >cannot_scalarize_away_bitmap hints. > > > >Total scalarization is a hack process where we chop up small > >aggregates according to their types - as opposed to actual accesses, > >meaning it is an alien process to the rest of SRA - knowing that they > >will go completely away afterwards (that is ensured by > >cannot_scalarize_away_bitmap) and that this will facilitate copy > >propagation (this is indicated by should_scalarize_away_bitmap, which > >has a bit set if there is a non-scalar assignment _from_ (a part of) > >the aggregate). > > OK, but for the copy x = y where x would go away it still depends on uses of > x what pieces we actually want? Or is total scalarization only done for x = > y; z = x;? > Thus no further accesses to x?
Total scalarization adds artificial accesses only to y, but (in both cases of total and "natural" scalarization) for all aggregate assignments between SRA candidates, SRA attempts to recreate all accesses covering RHS to LHS. Transitively. So the artificial accesses created for y will then get created for x and z even if they would not be candidates for total scalarization. So e.g. if z cannot go away because it is passed to a function, it will be loaded piece-wise from y. > > >> That is, for the constant pool case we > >> usually have > >> > >> x = .LC0; > >> .. = x[2]; > >> > >> so we have a "hint" that accesses on x are those we'd want to > >> optimize to accesses to .LC0. > > > >You don't need total scalarization for this, just the existence of > >read from x[2] would be sufficient but thanks for pointing me to the > >right direction. For constant pool decl scalarization, it is not > >important to introduce artificial accesses for x but for .LC0. > >Therefore, I have adapted the patch to allow char arrays for const > >decls only and verified that it scalarizes a const-pool array of chars > >on Aarch64. The (otherwise yet untested) result is below. > > > >What do you think? > > Why special case char arrays? I'd simply disallow total scalarization of > non-const arrays completely. Well, currently we will get element-wise copy propagation for things like "int rgb[3];" (possibly embeded in a small struct). If I remove it, someone will complain. Maybe the correct limit is SI mode size or BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though. I just wanted to be conservative at this point in the release cycle. Martin > > >Martin > > > > > >2017-04-13 Martin Jambor <mjam...@suse.cz> > > > > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make > > char arrays not totally scalarizable if it is false. > > (analyze_all_variable_accesses): Pass correct value in the new > > parameter. > > > >testsuite/ > > * g++.dg/tree-ssa/pr80293.C: New test. > >--- > >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 > >+++++++++++++++++++++++++++++++++ > > gcc/tree-sra.c | 21 ++++++++++----- > > 2 files changed, 60 insertions(+), 6 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >new file mode 100644 > >index 00000000000..7faf35ae983 > >--- /dev/null > >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >@@ -0,0 +1,45 @@ > >+// { dg-do compile } > >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ > >+ > >+#include <array> > >+ > >+// Return a copy of the underlying memory of an arbitrary value. > >+template < > >+ typename T, > >+ typename = typename > >std::enable_if<std::is_trivially_copyable<T>::value>::type > >+> > >+auto getMem( > >+ T const & value > >+) -> std::array<char, sizeof(T)> { > >+ auto ret = std::array<char, sizeof(T)>{}; > >+ __builtin_memcpy(ret.data(), &value, sizeof(T)); > >+ return ret; > >+} > >+ > >+template < > >+ typename T, > >+ typename = typename > >std::enable_if<std::is_trivially_copyable<T>::value>::type > >+> > >+auto fromMem( > >+ std::array<char, sizeof(T)> const & buf > >+) -> T { > >+ auto ret = T{}; > >+ __builtin_memcpy(&ret, buf.data(), sizeof(T)); > >+ return ret; > >+} > >+ > >+double foo1(std::uint64_t arg) { > >+ return fromMem<double>(getMem(arg)); > >+} > >+ > >+double foo2(std::uint64_t arg) { > >+ return *reinterpret_cast<double*>(&arg); > >+} > >+ > >+double foo3(std::uint64_t arg) { > >+ double ret; > >+ __builtin_memcpy(&ret, &arg, sizeof(arg)); > >+ return ret; > >+} > >+ > >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > >index 02453d3ed9a..ab06be66131 100644 > >--- a/gcc/tree-sra.c > >+++ b/gcc/tree-sra.c > >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool > >write) > > > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or > >fixed-length > >ARRAY_TYPE with fields that are either of gimple register types > >(excluding > >- bit-fields) or (recursively) scalarizable types. */ > >+ bit-fields) or (recursively) scalarizable types. CONST_DECL must > >be true if > >+ we are considering a decl from constant pool. If it is false, char > >arrays > >+ will be refused. */ > > > > static bool > >-scalarizable_type_p (tree type) > >+scalarizable_type_p (tree type, bool const_decl) > > { > > gcc_assert (!is_gimple_reg_type (type)); > > if (type_contains_placeholder_p (type)) > >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type) > > return false; > > > > if (!is_gimple_reg_type (ft) > >- && !scalarizable_type_p (ft)) > >+ && !scalarizable_type_p (ft, const_decl)) > > return false; > > } > > > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type) > > > > case ARRAY_TYPE: > > { > >+ HOST_WIDE_INT min_elem_size; > >+ if (const_decl) > >+ min_elem_size = 0; > >+ else > >+ min_elem_size = BITS_PER_UNIT; > >+ > > if (TYPE_DOMAIN (type) == NULL_TREE > > || !tree_fits_shwi_p (TYPE_SIZE (type)) > > || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) > >- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) > >+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) > > || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > > return false; > > if (tree_to_shwi (TYPE_SIZE (type)) == 0 > >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) > > > > tree elem = TREE_TYPE (type); > > if (!is_gimple_reg_type (elem) > >- && !scalarizable_type_p (elem)) > >+ && !scalarizable_type_p (elem, const_decl)) > > return false; > > return true; > > } > >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) > > { > > tree var = candidate (i); > > > >- if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > >+ if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > >+ constant_decl_p (var))) > > { > > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > > <= max_scalarization_size) >