On Tue, 28 Jan 2020, Martin Jambor wrote:

> Hi,
> 
> PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
> assignment coming from a C struct assignment and one a representing a
> folded memcpy, can kill the latter and keep in place only the former,
> which does not copy padding - at least when SRA decides to totally
> scalarize a least one of the aggregates (when not doing total
> scalarization, SRA cares about padding)
> 
> SRA would not totally scalarize an aggregate if it saw that it takes
> part in a gimple assignment which is a folded memcpy (see how
> type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
> because of the DSE decisions.
> 
> I was asked to modify SRA to take padding into account - and to copy
> it around - when totally scalarizing, which is what the patch below
> does.  I am not very happy about this, I am afraid it will lead to
> performance regressions, but this has all been discussed (see
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and
> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html).
> 
> I tried to alleviate the problem by not only inserting accesses for
> padding but also by enlarging existing accesses whenever possible to
> extend over padding - the extended access would get copied when in the
> original IL an aggregate copy is replaced with SRA copies and a
> BIT_FIELD_REF would be generated to replace a scalar access to a part
> of the aggregate in the original IL.  I have made it work in the sense
> that the patch passed bootstrap and testing (see the git branch
> refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127
> if you are interested), but this approach meant that each such
> extended replacement which was written to (so all of them) could
> potentially become only partially assigned to and so had to be marked
> as addressable and could not become gimple register, meaning that
> total scalarizatio would be creating addressable variables.  Detecting
> such cases is not easy, it would mean introducing yet another type of
> write flag (written to exactly this access) and propagate that flag
> across assignment sub-accesses.
> 
> So I decided that was not the way to go and instead only extended
> integer accesses, and that is what the atcg below does.  Like in the
> previous attempt, whatever padding could not be covered by extending
> an access would be covered by extra artificial accesses.  As you can
> see, it adds a little complexity to various places of the pass which
> are already not trivial, but hopefully it is manageable.
> 
> Bootstrapped and tested on x86_64-linux, I'll curious about the
> feedback.

 
+static void
+upgrade_integral_size_to_prec (struct access *access)
+{

missing function comment.

I'm almost sure that get_reg_access_replacement gets things wrong
on non-litte-endian platforms.  The function comment says something
about bitfield refs but you simply create truncations.

sra_type_for_size shouldn't use lang_hooks.types.type_for_size
but build_nonstandard_integer_type.  In total_scalarization_fill_padding
avoid the loop figuring out the larger type and just up it to
MIN (lastpos alignment, word-mode-size)?

There's PR71460 and a more recent one on total scalarization
creating FP accesses which is probably a bug we'd want to address
first (and which might simplify this patch?).

Overall the approach in the patch looks reasonable and code-generation
even improves in some cases I tried.

As said, I wonder how this integrates with a fix to avoid using
FP types for total scalarization (or any access decomposition
that was a non-FPmode before).

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2020-01-27  Martin Jambor  <mjam...@suse.cz>
> 
>       PR tree-optimization/92486
>       * tree-sra.c: Include langhooks.h
>       (struct access): New fields reg_size and reg_acc_type.
>       (dump_access): Print new fields.
>       (acc_size): New function.
>       (find_access_in_subtree): Use it, new parameter reg.
>       (get_var_base_offset_size_access): Pass true to
>       find_access_in_subtree.
>       (create_access_1): Initialize reg_size.
>       (create_artificial_child_access): Likewise.
>       (create_total_scalarization_access): Likewise.
>       (build_ref_for_model): Do not use model expr if reg_acc_size is
>       non-NULL.
>       (get_reg_access_replacement): New function.
>       (verify_sra_access_forest): Adjust verification for presence of
>       extended accesses covering padding.
>       (analyze_access_subtree): Undo extension over padding if total
>       scalarization failed, set grp_partial_lhs if we are going to introduce
>       a partial store to the new replacement, do not ignore holes when
>       totally scalarizing.
>       (sra_type_for_size): New function.
>       (total_scalarization_fill_padding): Likewise.
>       (total_should_skip_creating_access): Use it.
>       (totally_scalarize_subtree): Likewise.
>       (sra_modify_expr): Use get_reg_access_replacement instead of
>       get_access_replacement, adjust for reg_acc_type.
>       (sra_modify_assign): Likewise.
>       (load_assign_lhs_subreplacements): Pass false to
>       find_access_in_subtree.
> 
>       testsuite/
>       * gcc.dg/tree-ssa/pr92486.c: New test.
> ---
>  gcc/ChangeLog                           |  32 +++
>  gcc/testsuite/ChangeLog                 |   5 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr92486.c |  38 +++
>  gcc/tree-sra.c                          | 368 +++++++++++++++++++++---
>  4 files changed, 396 insertions(+), 47 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3541c6638f9..34c60e6f2a3 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,35 @@
> +2020-01-26  Martin Jambor  <mjam...@suse.cz>
> +
> +     PR tree-optimization/92486
> +     * tree-sra.c: Include langhooks.h
> +     (struct access): New fields reg_size and reg_acc_type.
> +     (dump_access): Print new fields.
> +     (acc_size): New function.
> +     (find_access_in_subtree): Use it, new parameter reg.
> +     (get_var_base_offset_size_access): Pass true to
> +     find_access_in_subtree.
> +     (create_access_1): Initialize reg_size.
> +     (create_artificial_child_access): Likewise.
> +     (create_total_scalarization_access): Likewise.
> +     (build_ref_for_model): Do not use model expr if reg_acc_size is
> +     non-NULL.
> +     (get_reg_access_replacement): New function.
> +     (verify_sra_access_forest): Adjust verification for presence of
> +     extended accesses covering padding.
> +     (analyze_access_subtree): Undo extension over padding if total
> +     scalarization failed, set grp_partial_lhs if we are going to introduce
> +     a partial store to the new replacement, do not ignore holes when
> +     totally scalarizing.
> +     (sra_type_for_size): New function.
> +     (total_scalarization_fill_padding): Likewise.
> +     (total_should_skip_creating_access): Use it.
> +     (totally_scalarize_subtree): Likewise.
> +     (sra_modify_expr): Use get_reg_access_replacement instead of
> +     get_access_replacement, adjust for reg_acc_type.
> +     (sra_modify_assign): Likewise.
> +     (load_assign_lhs_subreplacements): Pass false to
> +     find_access_in_subtree.
> +
>  2020-01-27  Martin Liska  <mli...@suse.cz>
>  
>       PR driver/91220
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 22a37dd1ab2..7ccb5098224 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-01-24  Martin Jambor  <mjam...@suse.cz>
> +
> +     PR tree-optimization/92486
> +     * gcc.dg/tree-ssa/pr92486.c: New test.
> +
>  2020-01-27  Martin Liska  <mli...@suse.cz>
>  
>       PR target/93274
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
> new file mode 100644
> index 00000000000..77e84241eff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +struct s {
> +    char c;
> +    int i;
> +};
> +
> +__attribute__((noipa))
> +void f(struct s *p, struct s *q)
> +{
> +    struct s w;
> +
> +    __builtin_memset(&w, 0, sizeof(struct s));
> +    w = *q;
> +
> +    __builtin_memset(p, 0, sizeof(struct s));
> +    *p = w;
> +}
> +
> +int main()
> +{
> +    struct s x;
> +    __builtin_memset(&x, 1, sizeof(struct s));
> +
> +    struct s y;
> +    __builtin_memset(&y, 2, sizeof(struct s));
> +
> +    f(&y, &x);
> +
> +    for (unsigned char *p = (unsigned char *)&y;
> +      p < (unsigned char *)&y + sizeof(struct s);
> +      p++)
> +      if (*p != 1)
> +     __builtin_abort ();
> +
> +    return 0;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index ea8594db193..bda342ffdf9 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "dbgcnt.h"
>  #include "builtins.h"
>  #include "tree-sra.h"
> +#include "langhooks.h"
>  
>  
>  /* Enumeration of all aggregate reductions we can do.  */
> @@ -130,11 +131,16 @@ struct assign_link;
>  
>  struct access
>  {
> -  /* Values returned by  `get_ref_base_and_extent' for each component 
> reference
> -     If EXPR isn't a component reference  just set `BASE = EXPR', `OFFSET = 
> 0',
> -     `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
> +  /* Offset, size and base are values returned by `get_ref_base_and_extent' 
> for
> +     each component reference If EXPR isn't a component reference just set
> +     `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
>    HOST_WIDE_INT offset;
>    HOST_WIDE_INT size;
> +
> +  /* If reg_acc_type is non-NULL, this is the size of the actual egister type
> +     the access represented before it was extended to cover padding.  
> Otherwise
> +     must be equal to size.  */
> +  HOST_WIDE_INT reg_size;
>    tree base;
>  
>    /* Expression.  It is context dependent so do not use it to create new
> @@ -144,6 +150,12 @@ struct access
>    /* Type.  */
>    tree type;
>  
> +  /* If non-NULL, this is the type that should be actually extracted or
> +     inserted from/to the replacement decl when replacing accesses to the
> +     individual field itself (as opposed to accesses created as part of
> +     replacing aggregate copies which should use TYPE).  */
> +  tree reg_acc_type;
> +
>    /* The statement this access belongs to.  */
>    gimple *stmt;
>  
> @@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp)
>    print_generic_expr (f, access->base);
>    fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset);
>    fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size);
> +  if (access->reg_size != access->size)
> +    fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size);
>    fprintf (f, ", expr = ");
>    print_generic_expr (f, access->expr);
>    fprintf (f, ", type = ");
>    print_generic_expr (f, access->type);
> +  if (access->reg_acc_type)
> +    {
> +      fprintf (f, ", reg_acc_type = ");
> +      print_generic_expr (f, access->type);
> +    }
>    fprintf (f, ", reverse = %d", access->reverse);
>    if (grp)
>      fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, "
> @@ -481,18 +500,27 @@ get_base_access_vector (tree base)
>    return base_access_vec->get (base);
>  }
>  
> +/* Return ACCESS's size or reg_size depending on REG.  */
> +
> +static HOST_WIDE_INT
> +acc_size (struct access *access, bool reg)
> +{
> +  return reg ? access->reg_size : access->size;
> +}
> +
>  /* Find an access with required OFFSET and SIZE in a subtree of accesses 
> rooted
>     in ACCESS.  Return NULL if it cannot be found.  */
>  
>  static struct access *
>  find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
> -                     HOST_WIDE_INT size)
> +                     HOST_WIDE_INT size, bool reg)
>  {
> -  while (access && (access->offset != offset || access->size != size))
> +  while (access && (access->offset != offset
> +                 || acc_size (access, reg) != size))
>      {
>        struct access *child = access->first_child;
>  
> -      while (child && (child->offset + child->size <= offset))
> +      while (child && (child->offset + acc_size (child, reg) <= offset))
>       child = child->next_sibling;
>        access = child;
>      }
> @@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, 
> HOST_WIDE_INT offset,
>    if (access)
>      while (access->first_child
>          && access->first_child->offset == offset
> -        && access->first_child->size == size)
> +        && acc_size (access->first_child, reg) == size)
>        access = access->first_child;
>  
>    return access;
> @@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT 
> offset,
>    if (!access)
>      return NULL;
>  
> -  return find_access_in_subtree (access, offset, size);
> +  return find_access_in_subtree (access, offset, size, true);
>  }
>  
>  /* Add LINK to the linked list of assign links of RACC.  */
> @@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, 
> HOST_WIDE_INT size)
>    access->base = base;
>    access->offset = offset;
>    access->size = size;
> +  access->reg_size = size;
>  
>    base_access_vec->get_or_insert (base).safe_push (access);
>  
> @@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, 
> HOST_WIDE_INT offset,
>      {
>        tree res;
>        if (model->grp_same_access_path
> +       && !model->reg_acc_type
>         && !TREE_THIS_VOLATILE (base)
>         && (TYPE_ADDR_SPACE (TREE_TYPE (base))
>             == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
> @@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access)
>    return access->replacement_decl;
>  }
>  
> +/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which
> +   case also construct BIT_FIELD_REF into the replacement of the 
> corresponding
> +   type (with location LOC).  */
> +
> +static tree
> +get_reg_access_replacement (location_t loc, struct access *access, bool 
> write,
> +                         gassign **conversion)
> +{
> +  tree repl = get_access_replacement (access);
> +  if (!access->reg_acc_type)
> +    {
> +      *conversion = NULL;
> +      return repl;
> +    }
> +
> +  tree tmp = make_ssa_name (access->reg_acc_type);
> +  if (write)
> +    *conversion = gimple_build_assign (repl, NOP_EXPR, tmp);
> +  else
> +    *conversion = gimple_build_assign (tmp, NOP_EXPR, repl);
> +  gimple_set_location (*conversion, loc);
> +  return tmp;
> +}
> +
>  
>  /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the
>     linked list along the way.  Stop when *ACCESS is NULL or the access 
> pointed
> @@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root)
>        gcc_assert (offset == access->offset);
>        gcc_assert (access->grp_unscalarizable_region
>                 || size == max_size);
> -      gcc_assert (max_size == access->size);
> +      if (access->reg_acc_type)
> +     gcc_assert (max_size == access->reg_size
> +                 && max_size < access->size);
> +      else
> +     gcc_assert (max_size == access->size
> +                 && max_size == access->reg_size);
>        gcc_assert (reverse == access->reverse);
>  
>        if (access->first_child)
> @@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr)
>    return false;
>  }
>  
> +static void
> +upgrade_integral_size_to_prec (struct access *access)
> +{
> +  /* Always create access replacements that cover the whole access.
> +     For integral types this means the precision has to match.
> +     Avoid assumptions based on the integral type kind, too.  */
> +  if (INTEGRAL_TYPE_P (access->type)
> +      && (TREE_CODE (access->type) != INTEGER_TYPE
> +       || TYPE_PRECISION (access->type) != access->size)
> +      /* But leave bitfield accesses alone.  */
> +      && (TREE_CODE (access->expr) != COMPONENT_REF
> +       || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))))
> +    {
> +      tree rt = access->type;
> +      gcc_assert ((access->offset % BITS_PER_UNIT) == 0
> +               && (access->size % BITS_PER_UNIT) == 0);
> +      access->type = build_nonstandard_integer_type (access->size,
> +                                                TYPE_UNSIGNED (rt));
> +      access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base,
> +                                      access->offset, access->reverse,
> +                                      access->type, NULL, false);
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +     {
> +       fprintf (dump_file, "Changing the type of a replacement for ");
> +       print_generic_expr (dump_file, access->base);
> +       fprintf (dump_file, " offset: %u, size: %u ",
> +                (unsigned) access->offset, (unsigned) access->size);
> +       fprintf (dump_file, " to an integer.\n");
> +     }
> +    }
> +}
> +
>  /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements 
> when
>     both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY
>     is set, we are totally scalarizing the aggregate.  Also set all sorts of
> @@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct 
> access *parent,
>       hole = true;
>      }
>  
> +  if (!totally && root->reg_acc_type)
> +    {
> +      /* If total scalarization did not eventually succeed, let's undo any
> +      futile attempts to cover padding.  */
> +      root->type = root->reg_acc_type;
> +      root->size = root->reg_size;
> +      root->reg_acc_type = NULL_TREE;
> +    }
> +
>    if (allow_replacements && scalar && !root->first_child
>        && (totally || !root->grp_total_scalarization)
>        && (totally
> @@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct 
> access *parent,
>         || ((root->grp_scalar_read || root->grp_assignment_read)
>             && (root->grp_scalar_write || root->grp_assignment_write))))
>      {
> -      /* Always create access replacements that cover the whole access.
> -         For integral types this means the precision has to match.
> -      Avoid assumptions based on the integral type kind, too.  */
> -      if (INTEGRAL_TYPE_P (root->type)
> -       && (TREE_CODE (root->type) != INTEGER_TYPE
> -           || TYPE_PRECISION (root->type) != root->size)
> -       /* But leave bitfield accesses alone.  */
> -       && (TREE_CODE (root->expr) != COMPONENT_REF
> -           || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1))))
> -     {
> -       tree rt = root->type;
> -       gcc_assert ((root->offset % BITS_PER_UNIT) == 0
> -                   && (root->size % BITS_PER_UNIT) == 0);
> -       root->type = build_nonstandard_integer_type (root->size,
> -                                                    TYPE_UNSIGNED (rt));
> -       root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,
> -                                          root->offset, root->reverse,
> -                                          root->type, NULL, false);
> -
> -       if (dump_file && (dump_flags & TDF_DETAILS))
> -         {
> -           fprintf (dump_file, "Changing the type of a replacement for ");
> -           print_generic_expr (dump_file, root->base);
> -           fprintf (dump_file, " offset: %u, size: %u ",
> -                    (unsigned) root->offset, (unsigned) root->size);
> -           fprintf (dump_file, " to an integer.\n");
> -         }
> -     }
> +      upgrade_integral_size_to_prec (root);
>  
>        root->grp_to_be_replaced = 1;
>        root->replacement_decl = create_access_replacement (root);
> @@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct 
> access *parent,
>       root->grp_total_scalarization = 0;
>      }
>  
> -  if (!hole || totally)
> +  if (!hole)
>      root->grp_covered = 1;
>    else if (root->grp_write || comes_initialized_p (root->base))
>      root->grp_unscalarized_data = 1; /* not covered and written to */
> @@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, 
> struct access *model,
>    access->expr = expr;
>    access->offset = new_offset;
>    access->size = model->size;
> +  gcc_assert (model->size == model->reg_size);
> +  access->reg_size = model->reg_size;
>    access->type = model->type;
>    access->parent = parent;
>    access->grp_read = set_grp_read;
> @@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access 
> *parent, HOST_WIDE_INT pos,
>    access->base = parent->base;
>    access->offset = pos;
>    access->size = size;
> +  access->reg_size = size;
>    access->expr = expr;
>    access->type = type;
>    access->parent = parent;
> @@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access 
> *parent, HOST_WIDE_INT pos,
>    return new_acc;
>  }
>  
> +/* Given SIZE return the integer type to represent that many bits or NULL if
> +   there is no suitable one.  */
> +
> +static tree
> +sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1)
> +{
> +  tree type = lang_hooks.types.type_for_size (size, unsignedp);
> +  scalar_int_mode mode;
> +  if (type
> +      && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
> +      && GET_MODE_BITSIZE (mode) == size)
> +    return type;
> +  else
> +    return NULL_TREE;
> +}
> +
> +/* Create a series of accesses to cover padding from FROM to TO anch chain 
> them
> +   between LAST_PTR and NEXT_CHILD.  Return the pointer to the last created
> +   one.  */
> +
> +static struct access *
> +fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from,
> +                         HOST_WIDE_INT to, struct access **last_ptr,
> +                         struct access *next_child)
> +{
> +  struct access *access = NULL;
> +
> +  gcc_assert (from < to);
> +  do
> +    {
> +      HOST_WIDE_INT diff = to - from;
> +      gcc_assert (diff >= BITS_PER_UNIT);
> +      HOST_WIDE_INT stsz = 1 << floor_log2 (diff);
> +      tree type;
> +
> +      while (true)
> +     {
> +       type = sra_type_for_size (stsz);
> +       if (type)
> +         break;
> +       stsz /= 2;
> +       gcc_checking_assert (stsz >= BITS_PER_UNIT);
> +     }
> +
> +      do {
> +     tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base,
> +                                       from, parent->reverse, type, NULL,
> +                                       false);
> +     access = create_total_scalarization_access (parent, from, stsz, type,
> +                                                 expr, last_ptr, next_child);
> +     access->grp_no_warning = 1;
> +     from += stsz;
> +      }
> +      while (to - from >= stsz);
> +      gcc_assert (from <= to);
> +    }
> +  while (from < to);
> +  return access;
> +}
> +
> +/* Check whether any padding between FROM and TO must be filled during
> +   total scalarization and if so, extend *LAST_SIBLING, create additional
> +   artificial accesses under PARENT or both to fill it.  Set *LAST_SIBLING to
> +   the last created access and set its next_sibling to NEXT_CHILD.  */
> +
> +static void
> +total_scalarization_fill_padding (struct access *parent,
> +                               struct access **last_sibling,
> +                               HOST_WIDE_INT from, HOST_WIDE_INT to,
> +                               struct access *next_child)
> +{
> +  if (constant_decl_p (parent->base))
> +    return;
> +
> +  gcc_assert (from <= to);
> +  if (from == to)
> +    return;
> +
> +  if (struct access *ls = *last_sibling)
> +    {
> +      /* First, perform any upgrades to full integers we would have done
> +      anyway.  */
> +      upgrade_integral_size_to_prec (ls);
> +
> +      HOST_WIDE_INT lastsize = ls->size;
> +      if (INTEGRAL_TYPE_P (ls->type)
> +       && pow2_or_zerop (lastsize))
> +     {
> +       /* The last field was a reaonably sized integer, let's try to
> +          enlarge as much as we can so that it is still naturally
> +          aligned.  */
> +       HOST_WIDE_INT lastpos = ls->offset;
> +       HOST_WIDE_INT blocksize = lastsize;
> +       tree type = NULL_TREE;
> +       int unsignedp = TYPE_UNSIGNED (ls->type);
> +
> +       while (true)
> +         {
> +           HOST_WIDE_INT b2 = blocksize * 2;
> +           if (lastpos + b2 > to
> +               || (lastpos % b2) != 0)
> +             break;
> +           tree t2 = sra_type_for_size (b2, unsignedp);
> +           if (!t2)
> +             break;
> +
> +           blocksize = b2;
> +           type = t2;
> +         }
> +
> +       if (blocksize != lastsize)
> +         {
> +           ls->reg_acc_type = ls->type;
> +           ls->type = type;
> +           ls->size = blocksize;
> +           from = lastpos + blocksize;
> +         }
> +
> +       if (from == to)
> +         return;
> +     }
> +    }
> +
> +  struct access **last_ptr;
> +  if (*last_sibling)
> +    last_ptr = &(*last_sibling)->next_sibling;
> +  else
> +    last_ptr = &parent->first_child;
> +  *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr,
> +                                           next_child);
> +}
> +
>  static bool totally_scalarize_subtree (struct access *root);
>  
>  /* Return true if INNER is either the same type as OUTER or if it is the type
> @@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access 
> *parent,
>                                  HOST_WIDE_INT size)
>  {
>    struct access *next_child;
> +  HOST_WIDE_INT covered;
>    if (!*last_seen_sibling)
> -    next_child = parent->first_child;
> +    {
> +      next_child = parent->first_child;
> +      covered = parent->offset;
> +    }
>    else
> -    next_child = (*last_seen_sibling)->next_sibling;
> +    {
> +      next_child = (*last_seen_sibling)->next_sibling;
> +      covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size;
> +    }
>  
>    /* First, traverse the chain of siblings until it points to an access with
>       offset at least equal to POS.  Check all skipped accesses whether they
> @@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access 
> *parent,
>      {
>        if (next_child->offset + next_child->size > pos)
>       return TOTAL_FLD_FAILED;
> +      total_scalarization_fill_padding (parent, last_seen_sibling, covered,
> +                                     next_child->offset, next_child);
> +      covered = next_child->offset + next_child->size;
>        *last_seen_sibling = next_child;
>        next_child = next_child->next_sibling;
>      }
>  
> +  total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos,
> +                                 next_child);
> +
>    /* Now check whether next_child has exactly the right POS and SIZE and if 
> so,
>       whether it can represent what we need and can be totally scalarized
>       itself.  */
> @@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root)
>      }
>  
>   out:
> +  /* Even though there is nothing in the type, there can still be accesses 
> here
> +     in the IL.  */
> +
> +  HOST_WIDE_INT covered;
> +  struct access *next_child;
> +
> +  if (last_seen_sibling)
> +    {
> +      covered = last_seen_sibling->offset + last_seen_sibling->size;
> +      next_child = last_seen_sibling->next_sibling;
> +    }
> +  else
> +    {
> +      covered = root->offset;
> +      next_child = root->first_child;
> +    }
> +
> +  while (next_child)
> +    {
> +      total_scalarization_fill_padding (root, &last_seen_sibling, covered,
> +                                     next_child->offset, next_child);
> +      covered = next_child->offset + next_child->size;
> +      last_seen_sibling = next_child;
> +      next_child = next_child->next_sibling;
> +    }
> +
> +  total_scalarization_fill_padding (root, &last_seen_sibling, covered,
> +                                 root->offset + root->size, NULL);
>    return true;
>  }
>  
> @@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
> bool write)
>  
>    if (access->grp_to_be_replaced)
>      {
> -      tree repl = get_access_replacement (access);
>        /* If we replace a non-register typed access simply use the original
>           access expression to extract the scalar component afterwards.
>        This happens if scalarizing a function return value or parameter
> @@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator 
> *gsi, bool write)
>           be accessed as a different type too, potentially creating a need for
>           type conversion (see PR42196) and when scalarized unions are 
> involved
>           in assembler statements (see PR42398).  */
> -      if (!useless_type_conversion_p (type, access->type))
> +      tree actual_type
> +     = access->reg_acc_type ? access->reg_acc_type : access->type;
> +
> +      if (!useless_type_conversion_p (type, actual_type))
>       {
>         tree ref;
>  
> +       tree repl = get_access_replacement (access);
>         ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
>  
>         if (write)
> @@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator 
> *gsi, bool write)
>           }
>       }
>        else
> -     *expr = repl;
> +     {
> +       gassign *conversion;
> +       tree repl = get_reg_access_replacement (loc, access, write,
> +                                               &conversion);
> +       *expr = repl;
> +
> +       if (conversion)
> +         {
> +           if (write)
> +             gsi_insert_after (gsi, conversion, GSI_NEW_STMT);
> +           else
> +             gsi_insert_before (gsi, conversion, GSI_SAME_STMT);
> +         }
> +     }
> +
>        sra_stats.exprs++;
>      }
>    else if (write && access->grp_to_be_debug_replaced)
> @@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc,
>         gassign *stmt;
>         tree rhs;
>  
> -       racc = find_access_in_subtree (sad->top_racc, offset, lacc->size);
> +       racc = find_access_in_subtree (sad->top_racc, offset, lacc->size,
> +                                      false);
>         if (racc && racc->grp_to_be_replaced)
>           {
>             rhs = get_access_replacement (racc);
> @@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc,
>             tree drhs;
>             struct access *racc = find_access_in_subtree (sad->top_racc,
>                                                           offset,
> -                                                         lacc->size);
> +                                                         lacc->size, false);
>  
>             if (racc && racc->grp_to_be_replaced)
>               {
> @@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>    loc = gimple_location (stmt);
>    if (lacc && lacc->grp_to_be_replaced)
>      {
> -      lhs = get_access_replacement (lacc);
> +      gassign *lhs_conversion = NULL;
> +      lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion);
>        gimple_assign_set_lhs (stmt, lhs);
> +      if (lhs_conversion)
> +     gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT);
>        modify_this_stmt = true;
>        if (lacc->grp_partial_lhs)
>       force_gimple_rhs = true;
> @@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>  
>    if (racc && racc->grp_to_be_replaced)
>      {
> -      rhs = get_access_replacement (racc);
> +      gassign *rhs_conversion = NULL;
> +      rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion);
> +      if (rhs_conversion)
> +     gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT);
>        modify_this_stmt = true;
>        if (racc->grp_partial_lhs)
>       force_gimple_rhs = true;
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to