Hi, On Tue, Jun 28, 2011 at 03:01:17PM +0200, Richard Guenther wrote: > On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor <mjam...@suse.cz> wrote: > > Hi, > > > > at the moment SRA can get confused by alignment padding and think that > > it actually contains some data for which there is no planned > > replacement and thus might leave some loads and stores in place > > instead of removing them. This is perhaps the biggest problem when we > > attempt total scalarization of simple structures exactly in order to > > get rid of these and of the variables altogether. > > > > I've pondered for quite a while how to best deal with them. One > > option was to make just the total scalarization stronger. I have also > > contemplated creating phantom accesses for padding I could detect > > (i.e. in simple structures) which would be more general, but this > > would complicate the parts of SRA which are already quite convoluted > > and I was not really sure it was worth it. > > > > Eventually I decided for the total scalarization option. This patch > > changes it such that the flag is propagated down the access tree but > > also, if it does not work out, is reset on the way up. If the flag > > survives, the access tree is considered "covered" by scalar > > replacements and thus it is known not to contain unscalarized data. > > > > While changing function analyze_access_subtree I have simplified the > > way we compute the hole flag and also fixed one comparison which we > > currently have the wrong way round but it fortunately does not matter > > because if there is a hole, the covered_to will never add up to the > > total size. I'll probably post a separate patch against 4.6 just in > > case someone attempts to read the source. > > > > Bootstrapped and tested on x86_64-linux, OK for trunk? > > So, what will it do for the testcase? > > The following is what I _think_ it should do: > > <bb 2>: > l = *p_1(D); > l$i_6 = p_1(D)->i; > D.2700_2 = l$i_6; > D.2701_3 = D.2700_2 + 1; > l$i_12 = D.2701_3; > *p_1(D) = l; > p_1(D)->i = l$i_12; > > and let FRE/DSE do their job (which they don't do, unfortunately). > So does your patch then remove the load/store from/to l but keep > the elementwise loads/stores (which are probably cleaned up by FRE)? >
Well, that is what would happen if no total scalarization was going on. Total scalarization is a poor-man's aggregate copy-propagation by splitting up small structures to individual fields whenever we can get rid of them this way (i.e. if they are never used in a non-assignment) which I introduced to fix PR 42585 - but unfortunately the padding problem did not occur to me until this winter. Currently, SRA performs very badly on the testcase, creating: <bb 2>: l = *p_1(D); l$i_6 = p_1(D)->i; l$f1_8 = p_1(D)->f1; l$f2_9 = p_1(D)->f2; l$f3_10 = p_1(D)->f3; l$f4_11 = p_1(D)->f4; D.1966_2 = l$i_6; D.1967_3 = D.1966_2 + 1; l$i_12 = D.1967_3; *p_1(D) = l; <-- this should not be here p_1(D)->i = l$i_12; p_1(D)->f1 = l$f1_8; p_1(D)->f2 = l$f2_9; p_1(D)->f3 = l$f3_10; p_1(D)->f4 = l$f4_11; return; Unfortunately, this basically survives all the way to the "optimized" dump. With the patch, the assignment *p_1(D) = l; is removed and copyprop1 and cddce1 turn this into: <bb 2>: l$i_6 = p_1(D)->i; D.1967_3 = l$i_6 + 1; p_1(D)->i = D.1967_3; return; which is then the "optimized" gimple, already before IPA and at -O1. For the record, without total scalarization, the "optimized" gimple would be: <bb 2>: l = *p_1(D); l$i_6 = p_1(D)->i; D.1967_3 = l$i_6 + 1; *p_1(D) = l; p_1(D)->i = D.1967_3; return; So at the moment FRE/DSE certainly does not help. Eventually we should do something like that or a real aggregate copy propagation but until then we probably need to live with the total scalarization thingy - I have learned in the PR mentioned above and a few others, there are people who really want at least this functionality now - and it should not perform this badly on unaligned structures. Martin > Richard. > > > > Thanks, > > > > Martin > > > > > > 2011-06-24 Martin Jambor <mjam...@suse.cz> > > > > * tree-sra.c (struct access): Rename total_scalarization to > > grp_total_scalarization > > (completely_scalarize_var): New function. > > (sort_and_splice_var_accesses): Set total_scalarization in the > > representative access. > > (analyze_access_subtree): Propagate total scalarization accross the > > tree, no holes in totally scalarized trees, simplify coverage > > computation. > > (analyze_all_variable_accesses): Call completely_scalarize_var > > instead > > of completely_scalarize_record. > > > > * testsuite/gcc.dg/tree-ssa/sra-12.c: New test. > > > > Index: src/gcc/tree-sra.c > > =================================================================== > > *** src.orig/gcc/tree-sra.c > > --- src/gcc/tree-sra.c > > *************** struct access > > *** 170,179 **** > > /* Is this particular access write access? */ > > unsigned write : 1; > > > > - /* Is this access an artificial one created to scalarize some record > > - entirely? */ > > - unsigned total_scalarization : 1; > > - > > /* Is this access an access to a non-addressable field? */ > > unsigned non_addressable : 1; > > > > --- 170,175 ---- > > *************** struct access > > *** 204,209 **** > > --- 200,209 ---- > > is not propagated in the access tree in any direction. */ > > unsigned grp_scalar_write : 1; > > > > + /* Is this access an artificial one created to scalarize some record > > + entirely? */ > > + unsigned grp_total_scalarization : 1; > > + > > /* Other passes of the analysis use this bit to make function > > analyze_access_subtree create scalar replacements for this group if > > possible. */ > > *************** dump_access (FILE *f, struct access *acc > > *** 377,402 **** > > fprintf (f, ", type = "); > > print_generic_expr (f, access->type, 0); > > if (grp) > > ! fprintf (f, ", total_scalarization = %d, grp_read = %d, grp_write = > > %d, " > > ! "grp_assignment_read = %d, grp_assignment_write = %d, " > > ! "grp_scalar_read = %d, grp_scalar_write = %d, " > > "grp_hint = %d, grp_covered = %d, " > > "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " > > "grp_partial_lhs = %d, grp_to_be_replaced = %d, " > > "grp_maybe_modified = %d, " > > "grp_not_necessarilly_dereferenced = %d\n", > > ! access->total_scalarization, access->grp_read, > > access->grp_write, > > ! access->grp_assignment_read, access->grp_assignment_write, > > ! access->grp_scalar_read, access->grp_scalar_write, > > access->grp_hint, access->grp_covered, > > access->grp_unscalarizable_region, > > access->grp_unscalarized_data, > > access->grp_partial_lhs, access->grp_to_be_replaced, > > access->grp_maybe_modified, > > access->grp_not_necessarilly_dereferenced); > > else > > ! fprintf (f, ", write = %d, total_scalarization = %d, " > > "grp_partial_lhs = %d\n", > > ! access->write, access->total_scalarization, > > access->grp_partial_lhs); > > } > > > > --- 377,402 ---- > > fprintf (f, ", type = "); > > print_generic_expr (f, access->type, 0); > > if (grp) > > ! fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = > > %d, " > > ! "grp_assignment_write = %d, grp_scalar_read = %d, " > > ! "grp_scalar_write = %d, grp_total_scalarization = %d, " > > "grp_hint = %d, grp_covered = %d, " > > "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " > > "grp_partial_lhs = %d, grp_to_be_replaced = %d, " > > "grp_maybe_modified = %d, " > > "grp_not_necessarilly_dereferenced = %d\n", > > ! access->grp_read, access->grp_write, > > access->grp_assignment_read, > > ! access->grp_assignment_write, access->grp_scalar_read, > > ! access->grp_scalar_write, access->grp_total_scalarization, > > access->grp_hint, access->grp_covered, > > access->grp_unscalarizable_region, > > access->grp_unscalarized_data, > > access->grp_partial_lhs, access->grp_to_be_replaced, > > access->grp_maybe_modified, > > access->grp_not_necessarilly_dereferenced); > > else > > ! fprintf (f, ", write = %d, grp_total_scalarization = %d, " > > "grp_partial_lhs = %d\n", > > ! access->write, access->grp_total_scalarization, > > access->grp_partial_lhs); > > } > > > > *************** completely_scalarize_record (tree base, > > *** 924,930 **** > > access = create_access_1 (base, pos, size); > > access->expr = nref; > > access->type = ft; > > ! access->total_scalarization = 1; > > /* Accesses for intraprocedural SRA can have their stmt NULL. */ > > } > > else > > --- 924,930 ---- > > access = create_access_1 (base, pos, size); > > access->expr = nref; > > access->type = ft; > > ! access->grp_total_scalarization = 1; > > /* Accesses for intraprocedural SRA can have their stmt NULL. */ > > } > > else > > *************** completely_scalarize_record (tree base, > > *** 932,937 **** > > --- 932,954 ---- > > } > > } > > > > + /* Create total_scalarization accesses for all scalar type fields in VAR > > and > > + for VAR a a whole. VAR must be of a RECORD_TYPE conforming to > > + type_consists_of_records_p. */ > > + > > + static void > > + completely_scalarize_var (tree var) > > + { > > + HOST_WIDE_INT size = tree_low_cst (DECL_SIZE (var), 1); > > + struct access *access; > > + > > + access = create_access_1 (var, 0, size); > > + access->expr = var; > > + access->type = TREE_TYPE (var); > > + access->grp_total_scalarization = 1; > > + > > + completely_scalarize_record (var, var, 0, var); > > + } > > > > /* Search the given tree for a declaration by skipping handled components > > and > > exclude it from the candidates. */ > > *************** sort_and_splice_var_accesses (tree var) > > *** 1713,1719 **** > > bool grp_assignment_read = access->grp_assignment_read; > > bool grp_assignment_write = access->grp_assignment_write; > > bool multiple_scalar_reads = false; > > ! bool total_scalarization = access->total_scalarization; > > bool grp_partial_lhs = access->grp_partial_lhs; > > bool first_scalar = is_gimple_reg_type (access->type); > > bool unscalarizable_region = access->grp_unscalarizable_region; > > --- 1730,1736 ---- > > bool grp_assignment_read = access->grp_assignment_read; > > bool grp_assignment_write = access->grp_assignment_write; > > bool multiple_scalar_reads = false; > > ! bool total_scalarization = access->grp_total_scalarization; > > bool grp_partial_lhs = access->grp_partial_lhs; > > bool first_scalar = is_gimple_reg_type (access->type); > > bool unscalarizable_region = access->grp_unscalarizable_region; > > *************** sort_and_splice_var_accesses (tree var) > > *** 1757,1763 **** > > grp_assignment_write |= ac2->grp_assignment_write; > > grp_partial_lhs |= ac2->grp_partial_lhs; > > unscalarizable_region |= ac2->grp_unscalarizable_region; > > ! total_scalarization |= ac2->total_scalarization; > > relink_to_new_repr (access, ac2); > > > > /* If there are both aggregate-type and scalar-type accesses with > > --- 1774,1780 ---- > > grp_assignment_write |= ac2->grp_assignment_write; > > grp_partial_lhs |= ac2->grp_partial_lhs; > > unscalarizable_region |= ac2->grp_unscalarizable_region; > > ! total_scalarization |= ac2->grp_total_scalarization; > > relink_to_new_repr (access, ac2); > > > > /* If there are both aggregate-type and scalar-type accesses with > > *************** sort_and_splice_var_accesses (tree var) > > *** 1778,1783 **** > > --- 1795,1801 ---- > > access->grp_assignment_read = grp_assignment_read; > > access->grp_assignment_write = grp_assignment_write; > > access->grp_hint = multiple_scalar_reads || total_scalarization; > > + access->grp_total_scalarization = total_scalarization; > > access->grp_partial_lhs = grp_partial_lhs; > > access->grp_unscalarizable_region = unscalarizable_region; > > if (access->first_link) > > *************** analyze_access_subtree (struct access *r > > *** 2023,2028 **** > > --- 2041,2048 ---- > > root->grp_write = 1; > > if (parent->grp_assignment_write) > > root->grp_assignment_write = 1; > > + if (parent->grp_total_scalarization) > > + root->grp_total_scalarization = 1; > > } > > > > if (root->grp_unscalarizable_region) > > *************** analyze_access_subtree (struct access *r > > *** 2033,2048 **** > > > > for (child = root->first_child; child; child = child->next_sibling) > > { > > ! if (!hole && child->offset < covered_to) > > ! hole = true; > > ! else > > ! covered_to += child->size; > > ! > > sth_created |= analyze_access_subtree (child, root, > > allow_replacements && !scalar); > > > > root->grp_unscalarized_data |= child->grp_unscalarized_data; > > ! hole |= !child->grp_covered; > > } > > > > if (allow_replacements && scalar && !root->first_child > > --- 2053,2068 ---- > > > > for (child = root->first_child; child; child = child->next_sibling) > > { > > ! hole |= covered_to < child->offset; > > sth_created |= analyze_access_subtree (child, root, > > allow_replacements && !scalar); > > > > root->grp_unscalarized_data |= child->grp_unscalarized_data; > > ! root->grp_total_scalarization &= child->grp_total_scalarization; > > ! if (child->grp_covered) > > ! covered_to += child->size; > > ! else > > ! hole = true; > > } > > > > if (allow_replacements && scalar && !root->first_child > > *************** analyze_access_subtree (struct access *r > > *** 2063,2072 **** > > sth_created = true; > > hole = false; > > } > > ! else if (covered_to < limit) > > ! hole = true; > > > > ! if (sth_created && !hole) > > { > > root->grp_covered = 1; > > return true; > > --- 2083,2098 ---- > > sth_created = true; > > hole = false; > > } > > ! else > > ! { > > ! if (covered_to < limit) > > ! hole = true; > > ! if (scalar) > > ! root->grp_total_scalarization = 0; > > ! } > > > > ! if (sth_created > > ! && (!hole || root->grp_total_scalarization)) > > { > > root->grp_covered = 1; > > return true; > > *************** analyze_all_variable_accesses (void) > > *** 2288,2294 **** > > <= max_total_scalarization_size) > > && type_consists_of_records_p (TREE_TYPE (var))) > > { > > ! completely_scalarize_record (var, var, 0, var); > > if (dump_file && (dump_flags & TDF_DETAILS)) > > { > > fprintf (dump_file, "Will attempt to totally scalarize "); > > --- 2314,2320 ---- > > <= max_total_scalarization_size) > > && type_consists_of_records_p (TREE_TYPE (var))) > > { > > ! completely_scalarize_var (var); > > if (dump_file && (dump_flags & TDF_DETAILS)) > > { > > fprintf (dump_file, "Will attempt to totally scalarize "); > > Index: src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c > > =================================================================== > > *** /dev/null > > --- src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c > > *************** > > *** 0 **** > > --- 1,25 ---- > > + /* Verify that SRA total scalarization will not be confused by padding. > > */ > > + > > + /* { dg-do compile } */ > > + /* { dg-options "-O1 -fdump-tree-release_ssa" } */ > > + > > + struct S > > + { > > + int i; > > + unsigned short f1; > > + char f2; > > + unsigned short f3, f4; > > + }; > > + > > + > > + int foo (struct S *p) > > + { > > + struct S l; > > + > > + l = *p; > > + l.i++; > > + *p = l; > > + } > > + > > + /* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa"} } */ > > + /* { dg-final { cleanup-tree-dump "release_ssa" } } */ > > > >