On Tue, 28 Jun 2011, Martin Jambor wrote:

> Hi,
> 
> On Tue, Jun 28, 2011 at 03:01:17PM +0200, Richard Guenther wrote:
> > On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor <mjam...@suse.cz> wrote:
> > > Hi,
> > >
> > > at the moment SRA can get confused by alignment padding and think that
> > > it actually contains some data for which there is no planned
> > > replacement and thus might leave some loads and stores in place
> > > instead of removing them.  This is perhaps the biggest problem when we
> > > attempt total scalarization of simple structures exactly in order to
> > > get rid of these and of the variables altogether.
> > >
> > > I've pondered for quite a while how to best deal with them.  One
> > > option was to make just the total scalarization stronger.  I have also
> > > contemplated creating phantom accesses for padding I could detect
> > > (i.e. in simple structures) which would be more general, but this
> > > would complicate the parts of SRA which are already quite convoluted
> > > and I was not really sure it was worth it.
> > >
> > > Eventually I decided for the total scalarization option.  This patch
> > > changes it such that the flag is propagated down the access tree but
> > > also, if it does not work out, is reset on the way up.  If the flag
> > > survives, the access tree is considered "covered" by scalar
> > > replacements and thus it is known not to contain unscalarized data.
> > >
> > > While changing function analyze_access_subtree I have simplified the
> > > way we compute the hole flag and also fixed one comparison which we
> > > currently have the wrong way round but it fortunately does not matter
> > > because if there is a hole, the covered_to will never add up to the
> > > total size.  I'll probably post a separate patch against 4.6 just in
> > > case someone attempts to read the source.
> > >
> > > Bootstrapped and tested on x86_64-linux, OK for trunk?
> > 
> > So, what will it do for the testcase?
> > 
> > The following is what I _think_ it should do:
> > 
> > <bb 2>:
> >   l = *p_1(D);
> >   l$i_6 = p_1(D)->i;
> >   D.2700_2 = l$i_6;
> >   D.2701_3 = D.2700_2 + 1;
> >   l$i_12 = D.2701_3;
> >   *p_1(D) = l;
> >   p_1(D)->i = l$i_12;
> > 
> > and let FRE/DSE do their job (which they don't do, unfortunately).
> > So does your patch then remove the load/store from/to l but keep
> > the elementwise loads/stores (which are probably cleaned up by FRE)?
> > 
> 
> Well, that is what would happen if no total scalarization was going
> on.  Total scalarization is a poor-man's aggregate copy-propagation by
> splitting up small structures to individual fields whenever we can get
> rid of them this way (i.e. if they are never used in a non-assignment)
> which I introduced to fix PR 42585 - but unfortunately the padding
> problem did not occur to me until this winter.
> 
> Currently, SRA performs very badly on the testcase, creating:
> 
> <bb 2>:
>   l = *p_1(D);
>   l$i_6 = p_1(D)->i;
>   l$f1_8 = p_1(D)->f1;
>   l$f2_9 = p_1(D)->f2;
>   l$f3_10 = p_1(D)->f3;
>   l$f4_11 = p_1(D)->f4;
>   D.1966_2 = l$i_6;
>   D.1967_3 = D.1966_2 + 1;
>   l$i_12 = D.1967_3;
>   *p_1(D) = l;              <-- this should not be here
>   p_1(D)->i = l$i_12;
>   p_1(D)->f1 = l$f1_8;
>   p_1(D)->f2 = l$f2_9;
>   p_1(D)->f3 = l$f3_10;
>   p_1(D)->f4 = l$f4_11;
>   return;
> 
> Unfortunately, this basically survives all the way to the "optimized"
> dump.  With the patch, the assignment *p_1(D) = l; is removed and
> copyprop1 and cddce1 turn this into:
> 
> <bb 2>:
>   l$i_6 = p_1(D)->i;
>   D.1967_3 = l$i_6 + 1;
>   p_1(D)->i = D.1967_3;
>   return;
> 
> which is then the "optimized" gimple, already before IPA and at -O1.

Ok, that's certainly better.  Can you file a bugreport for the FRE/DSE
issue please?

> For the record, without total scalarization, the "optimized" gimple
> would be:
> 
> <bb 2>:
>   l = *p_1(D);
>   l$i_6 = p_1(D)->i;
>   D.1967_3 = l$i_6 + 1;
>   *p_1(D) = l;
>   p_1(D)->i = D.1967_3;
>   return;
> 
> So at the moment FRE/DSE certainly does not help.  Eventually we
> should do something like that or a real aggregate copy propagation but
> until then we probably need to live with the total scalarization
> thingy - I have learned in the PR mentioned above and a few others,
> there are people who really want at least this functionality now - and
> it should not perform this badly on unaligned structures.

The only thing I'm worried about is

struct X { int i; short s; pad p; int j; };
struct Y { int i; short s; int j; };

int main()
{
  struct X x = { 1, 2, 3, 4 };
  struct Y y = *(struct Y)&x;
  y.s++;
  *(struct Y)&x = y;
  if (x->pad != 3)
    abort ();
  return 0;
}

untested, but any variant needs to make sure the padding value is not
lost by means of assigning x to y and back.  The above should translate
to y = MEM[&x + 0]; ... MEM[&x + 0] = y;

So if the result for sth like the above looks ok the patch is ok.

Thanks,
Richard.

> Martin
> 
> 
> 
> 
> > Richard.
> > 
> > 
> > > Thanks,
> > >
> > > Martin
> > >
> > >
> > > 2011-06-24  Martin Jambor  <mjam...@suse.cz>
> > >
> > >        * tree-sra.c (struct access): Rename total_scalarization to
> > >        grp_total_scalarization
> > >        (completely_scalarize_var): New function.
> > >        (sort_and_splice_var_accesses): Set total_scalarization in the
> > >        representative access.
> > >        (analyze_access_subtree): Propagate total scalarization accross the
> > >        tree, no holes in totally scalarized trees, simplify coverage
> > >        computation.
> > >        (analyze_all_variable_accesses): Call completely_scalarize_var 
> > > instead
> > >        of completely_scalarize_record.
> > >
> > >        * testsuite/gcc.dg/tree-ssa/sra-12.c: New test.
> > >
> > > Index: src/gcc/tree-sra.c
> > > ===================================================================
> > > *** src.orig/gcc/tree-sra.c
> > > --- src/gcc/tree-sra.c
> > > *************** struct access
> > > *** 170,179 ****
> > >    /* Is this particular access write access? */
> > >    unsigned write : 1;
> > >
> > > -   /* Is this access an artificial one created to scalarize some record
> > > -      entirely? */
> > > -   unsigned total_scalarization : 1;
> > > -
> > >    /* Is this access an access to a non-addressable field? */
> > >    unsigned non_addressable : 1;
> > >
> > > --- 170,175 ----
> > > *************** struct access
> > > *** 204,209 ****
> > > --- 200,209 ----
> > >       is not propagated in the access tree in any direction.  */
> > >    unsigned grp_scalar_write : 1;
> > >
> > > +   /* Is this access an artificial one created to scalarize some record
> > > +      entirely? */
> > > +   unsigned grp_total_scalarization : 1;
> > > +
> > >    /* Other passes of the analysis use this bit to make function
> > >       analyze_access_subtree create scalar replacements for this group if
> > >       possible.  */
> > > *************** dump_access (FILE *f, struct access *acc
> > > *** 377,402 ****
> > >    fprintf (f, ", type = ");
> > >    print_generic_expr (f, access->type, 0);
> > >    if (grp)
> > > !     fprintf (f, ", total_scalarization = %d, grp_read = %d, grp_write = 
> > > %d, "
> > > !            "grp_assignment_read = %d, grp_assignment_write = %d, "
> > > !            "grp_scalar_read = %d, grp_scalar_write = %d, "
> > >             "grp_hint = %d, grp_covered = %d, "
> > >             "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, "
> > >             "grp_partial_lhs = %d, grp_to_be_replaced = %d, "
> > >             "grp_maybe_modified = %d, "
> > >             "grp_not_necessarilly_dereferenced = %d\n",
> > > !            access->total_scalarization, access->grp_read, 
> > > access->grp_write,
> > > !            access->grp_assignment_read, access->grp_assignment_write,
> > > !            access->grp_scalar_read, access->grp_scalar_write,
> > >             access->grp_hint, access->grp_covered,
> > >             access->grp_unscalarizable_region, 
> > > access->grp_unscalarized_data,
> > >             access->grp_partial_lhs, access->grp_to_be_replaced,
> > >             access->grp_maybe_modified,
> > >             access->grp_not_necessarilly_dereferenced);
> > >    else
> > > !     fprintf (f, ", write = %d, total_scalarization = %d, "
> > >             "grp_partial_lhs = %d\n",
> > > !            access->write, access->total_scalarization,
> > >             access->grp_partial_lhs);
> > >  }
> > >
> > > --- 377,402 ----
> > >    fprintf (f, ", type = ");
> > >    print_generic_expr (f, access->type, 0);
> > >    if (grp)
> > > !     fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = 
> > > %d, "
> > > !            "grp_assignment_write = %d, grp_scalar_read = %d, "
> > > !            "grp_scalar_write = %d, grp_total_scalarization = %d, "
> > >             "grp_hint = %d, grp_covered = %d, "
> > >             "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, "
> > >             "grp_partial_lhs = %d, grp_to_be_replaced = %d, "
> > >             "grp_maybe_modified = %d, "
> > >             "grp_not_necessarilly_dereferenced = %d\n",
> > > !            access->grp_read, access->grp_write, 
> > > access->grp_assignment_read,
> > > !            access->grp_assignment_write, access->grp_scalar_read,
> > > !            access->grp_scalar_write, access->grp_total_scalarization,
> > >             access->grp_hint, access->grp_covered,
> > >             access->grp_unscalarizable_region, 
> > > access->grp_unscalarized_data,
> > >             access->grp_partial_lhs, access->grp_to_be_replaced,
> > >             access->grp_maybe_modified,
> > >             access->grp_not_necessarilly_dereferenced);
> > >    else
> > > !     fprintf (f, ", write = %d, grp_total_scalarization = %d, "
> > >             "grp_partial_lhs = %d\n",
> > > !            access->write, access->grp_total_scalarization,
> > >             access->grp_partial_lhs);
> > >  }
> > >
> > > *************** completely_scalarize_record (tree base,
> > > *** 924,930 ****
> > >            access = create_access_1 (base, pos, size);
> > >            access->expr = nref;
> > >            access->type = ft;
> > > !           access->total_scalarization = 1;
> > >            /* Accesses for intraprocedural SRA can have their stmt NULL.  
> > > */
> > >          }
> > >        else
> > > --- 924,930 ----
> > >            access = create_access_1 (base, pos, size);
> > >            access->expr = nref;
> > >            access->type = ft;
> > > !           access->grp_total_scalarization = 1;
> > >            /* Accesses for intraprocedural SRA can have their stmt NULL.  
> > > */
> > >          }
> > >        else
> > > *************** completely_scalarize_record (tree base,
> > > *** 932,937 ****
> > > --- 932,954 ----
> > >        }
> > >  }
> > >
> > > + /* Create total_scalarization accesses for all scalar type fields in 
> > > VAR and
> > > +    for VAR a a whole.  VAR must be of a RECORD_TYPE conforming to
> > > +    type_consists_of_records_p.   */
> > > +
> > > + static void
> > > + completely_scalarize_var (tree var)
> > > + {
> > > +   HOST_WIDE_INT size = tree_low_cst (DECL_SIZE (var), 1);
> > > +   struct access *access;
> > > +
> > > +   access = create_access_1 (var, 0, size);
> > > +   access->expr = var;
> > > +   access->type = TREE_TYPE (var);
> > > +   access->grp_total_scalarization = 1;
> > > +
> > > +   completely_scalarize_record (var, var, 0, var);
> > > + }
> > >
> > >  /* Search the given tree for a declaration by skipping handled 
> > > components and
> > >     exclude it from the candidates.  */
> > > *************** sort_and_splice_var_accesses (tree var)
> > > *** 1713,1719 ****
> > >        bool grp_assignment_read = access->grp_assignment_read;
> > >        bool grp_assignment_write = access->grp_assignment_write;
> > >        bool multiple_scalar_reads = false;
> > > !       bool total_scalarization = access->total_scalarization;
> > >        bool grp_partial_lhs = access->grp_partial_lhs;
> > >        bool first_scalar = is_gimple_reg_type (access->type);
> > >        bool unscalarizable_region = access->grp_unscalarizable_region;
> > > --- 1730,1736 ----
> > >        bool grp_assignment_read = access->grp_assignment_read;
> > >        bool grp_assignment_write = access->grp_assignment_write;
> > >        bool multiple_scalar_reads = false;
> > > !       bool total_scalarization = access->grp_total_scalarization;
> > >        bool grp_partial_lhs = access->grp_partial_lhs;
> > >        bool first_scalar = is_gimple_reg_type (access->type);
> > >        bool unscalarizable_region = access->grp_unscalarizable_region;
> > > *************** sort_and_splice_var_accesses (tree var)
> > > *** 1757,1763 ****
> > >          grp_assignment_write |= ac2->grp_assignment_write;
> > >          grp_partial_lhs |= ac2->grp_partial_lhs;
> > >          unscalarizable_region |= ac2->grp_unscalarizable_region;
> > > !         total_scalarization |= ac2->total_scalarization;
> > >          relink_to_new_repr (access, ac2);
> > >
> > >          /* If there are both aggregate-type and scalar-type accesses with
> > > --- 1774,1780 ----
> > >          grp_assignment_write |= ac2->grp_assignment_write;
> > >          grp_partial_lhs |= ac2->grp_partial_lhs;
> > >          unscalarizable_region |= ac2->grp_unscalarizable_region;
> > > !         total_scalarization |= ac2->grp_total_scalarization;
> > >          relink_to_new_repr (access, ac2);
> > >
> > >          /* If there are both aggregate-type and scalar-type accesses with
> > > *************** sort_and_splice_var_accesses (tree var)
> > > *** 1778,1783 ****
> > > --- 1795,1801 ----
> > >        access->grp_assignment_read = grp_assignment_read;
> > >        access->grp_assignment_write = grp_assignment_write;
> > >        access->grp_hint = multiple_scalar_reads || total_scalarization;
> > > +       access->grp_total_scalarization = total_scalarization;
> > >        access->grp_partial_lhs = grp_partial_lhs;
> > >        access->grp_unscalarizable_region = unscalarizable_region;
> > >        if (access->first_link)
> > > *************** analyze_access_subtree (struct access *r
> > > *** 2023,2028 ****
> > > --- 2041,2048 ----
> > >        root->grp_write = 1;
> > >        if (parent->grp_assignment_write)
> > >        root->grp_assignment_write = 1;
> > > +       if (parent->grp_total_scalarization)
> > > +       root->grp_total_scalarization = 1;
> > >      }
> > >
> > >    if (root->grp_unscalarizable_region)
> > > *************** analyze_access_subtree (struct access *r
> > > *** 2033,2048 ****
> > >
> > >    for (child = root->first_child; child; child = child->next_sibling)
> > >      {
> > > !       if (!hole && child->offset < covered_to)
> > > !       hole = true;
> > > !       else
> > > !       covered_to += child->size;
> > > !
> > >        sth_created |= analyze_access_subtree (child, root,
> > >                                             allow_replacements && 
> > > !scalar);
> > >
> > >        root->grp_unscalarized_data |= child->grp_unscalarized_data;
> > > !       hole |= !child->grp_covered;
> > >      }
> > >
> > >    if (allow_replacements && scalar && !root->first_child
> > > --- 2053,2068 ----
> > >
> > >    for (child = root->first_child; child; child = child->next_sibling)
> > >      {
> > > !       hole |= covered_to < child->offset;
> > >        sth_created |= analyze_access_subtree (child, root,
> > >                                             allow_replacements && 
> > > !scalar);
> > >
> > >        root->grp_unscalarized_data |= child->grp_unscalarized_data;
> > > !       root->grp_total_scalarization &= child->grp_total_scalarization;
> > > !       if (child->grp_covered)
> > > !       covered_to += child->size;
> > > !       else
> > > !       hole = true;
> > >      }
> > >
> > >    if (allow_replacements && scalar && !root->first_child
> > > *************** analyze_access_subtree (struct access *r
> > > *** 2063,2072 ****
> > >        sth_created = true;
> > >        hole = false;
> > >      }
> > > !   else if (covered_to < limit)
> > > !     hole = true;
> > >
> > > !   if (sth_created && !hole)
> > >      {
> > >        root->grp_covered = 1;
> > >        return true;
> > > --- 2083,2098 ----
> > >        sth_created = true;
> > >        hole = false;
> > >      }
> > > !   else
> > > !     {
> > > !       if (covered_to < limit)
> > > !       hole = true;
> > > !       if (scalar)
> > > !       root->grp_total_scalarization = 0;
> > > !     }
> > >
> > > !   if (sth_created
> > > !       && (!hole || root->grp_total_scalarization))
> > >      {
> > >        root->grp_covered = 1;
> > >        return true;
> > > *************** analyze_all_variable_accesses (void)
> > > *** 2288,2294 ****
> > >                <= max_total_scalarization_size)
> > >            && type_consists_of_records_p (TREE_TYPE (var)))
> > >          {
> > > !           completely_scalarize_record (var, var, 0, var);
> > >            if (dump_file && (dump_flags & TDF_DETAILS))
> > >              {
> > >                fprintf (dump_file, "Will attempt to totally scalarize ");
> > > --- 2314,2320 ----
> > >                <= max_total_scalarization_size)
> > >            && type_consists_of_records_p (TREE_TYPE (var)))
> > >          {
> > > !           completely_scalarize_var (var);
> > >            if (dump_file && (dump_flags & TDF_DETAILS))
> > >              {
> > >                fprintf (dump_file, "Will attempt to totally scalarize ");
> > > Index: src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c
> > > ===================================================================
> > > *** /dev/null
> > > --- src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c
> > > ***************
> > > *** 0 ****
> > > --- 1,25 ----
> > > + /* Verify that SRA total scalarization will not be confused by padding. 
> > >  */
> > > +
> > > + /* { dg-do compile } */
> > > + /* { dg-options "-O1 -fdump-tree-release_ssa" } */
> > > +
> > > + struct S
> > > + {
> > > +   int i;
> > > +   unsigned short f1;
> > > +   char f2;
> > > +   unsigned short f3, f4;
> > > + };
> > > +
> > > +
> > > + int foo (struct S *p)
> > > + {
> > > +   struct S l;
> > > +
> > > +   l = *p;
> > > +   l.i++;
> > > +   *p = l;
> > > + }
> > > +
> > > + /* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa"} } */
> > > + /* { dg-final { cleanup-tree-dump "release_ssa" } } */
> > >
> > >
> 

Reply via email to