On Fri, Nov 24, 2017 at 2:00 PM, Martin Jambor <mjam...@suse.cz> wrote: > On Fri, Nov 24 2017, Richard Biener wrote: >> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor <mjam...@suse.cz> wrote: >>> Hi Richi, >>> >>> On Fri, Nov 24 2017, Richard Biener wrote: >>>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener >>> >>> .. >>> >>>>>>>> And yes, I've been worried about SRA as well here... it _does_ >>>>>>>> have some early outs when seeing VIEW_CONVERT_EXPR but >>>>>>>> appearantly not for the above. Testcase that aborts with SRA but >>>>>>>> not without: >>>>>>>> >>>>>>>> struct A { short s; long i; long j; }; >>>>>>>> struct A a, b; >>>>>>>> void foo () >>>>>>>> { >>>>>>>> struct A c; >>>>>>>> __builtin_memcpy (&c, &b, sizeof (struct A)); >>>>>>>> __builtin_memcpy (&a, &c, sizeof (struct A)); >>>>>>>> } >>>>>>>> int main() >>>>>>>> { >>>>>>>> __builtin_memset (&b, 0, sizeof (struct A)); >>>>>>>> b.s = 1; >>>>>>>> __builtin_memcpy ((char *)&b+2, &b, 2); >>>>>>>> foo (); >>>>>>>> __builtin_memcpy (&a, (char *)&a+2, 2); >>>>>>>> if (a.s != 1) >>>>>>>> __builtin_abort (); >>>>>>>> return 0; >>>>>>>> } >>>>>>> >>>>>>> Thanks for the testcase, I agree that is a fairly big problem. Do you >>>>>>> think that the following (untested) patch is an appropriate way of >>>>>>> fixing it and generally of extending gimple to capture that a statement >>>>>>> is a bit-copy? >>>>>> >>>>>> I think the place to fix is the memcpy folding. That is, we'd say that >>>>>> aggregate assignments are not bit-copies but do element-wise assignments. >>>>>> For memcpy folding we'd then need to use a type that doesn't contain >>>>>> padding. Which effectively means char[]. >>>>>> >>>>>> Of course we need to stop SRA from decomposing that copy to >>>>>> individual characters then ;) >>>>>> >>>>>> So iff we decide that all aggregate copies are element copies, >>>>>> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match >>>>>> (currently we allow TYPE_CANONICAL compatibility and thus there >>>>>> might be some mismatches), then we have to fix nothign but >>>>>> the memcpy folding. >>>>>> >>>>>>> If so, I'll add the testcase, bootstrap it and formally propose it. >>>>>>> Subsequently I will of course make sure that any element-wise copying >>>>>>> patch would test the predicate. >>>>>> >>>>>> I don't think the alias-set should determine whether a copy is >>>>>> bit-wise or not. >>>>> >>>>> Like the attached. At least FAILs >>>>> >>>>> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1 >>>>> "memcpy[^\n]*123456" 2 (found 0 times) >>>>> >>>>> not sure why we have this test. >>>> >>>> Hum. And SRA still decomposes the copy to struct elements w/o padding >>>> even though the access is done using char[]. So somehow it ignores >>>> VIEW_CONVERT_EXPRs >>>> (well, those implicitely present on MEM_REFs). >>> >>> Yes. SRA is not even too afraid of top-level V_C_Es. It really bails >>> out only if they are buried under a handled_component. And it does not >>> remove aggregate assignments containing them. >>> >>>> >>>> Looks like this is because total scalarization is done on the decls >>>> and not at all >>>> honoring how the variable is accessed? The following seems to fix >>>> that, otherwise untested. >>> >>> >>>> >>>> Index: gcc/tree-sra.c >>>> =================================================================== >>>> --- gcc/tree-sra.c (revision 255137) >>>> +++ gcc/tree-sra.c (working copy) >>>> @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt >>>> { >>>> racc->grp_assignment_read = 1; >>>> if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt) >>>> - && !is_gimple_reg_type (racc->type)) >>>> + && !is_gimple_reg_type (racc->type) >>>> + && (TYPE_MAIN_VARIANT (racc->type) >>>> + == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base)))) >>>> bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID >>>> (racc->base)); >>>> if (storage_order_barrier_p (lhs)) >>>> racc->grp_unscalarizable_region = 1; >>> >>> I believe that the added condition is not what you want, this seems >>> to trigger also for ordinary: >>> >>> s1 = s2.field >>> >>> Where racc->type is type of the field but racc->base is s2 and its type >>> is type of the structure. >> >> Yes. But do we want to totally scalarize s2 in this case? We only >> access parts of it. We don't seem to have a testcase that fails >> (well, full testing still in progress). > > If we start with > > small_s1 = src; > dst = small_s1->even_smaller_struct_field; > > and small_s1 is otherwise unused, I think that we want to facilitate > copy propagation with total scalarization too. > >> >>> I also think you want to be setting a bit in >>> cannot_scalarize_away_bitmap in order to guarantee that total >>> scalarization will not happen for the given candidate. Otherwise some >>> other regular assignment might trigger it ... >> >> Yeah, figured that out myself. >> >>> except if we then also >>> checked the statement for bit-copying types in sra_modify_assign (in the >>> condition after the big comment), which I suppose is actually the >>> correct thing to do. >> >> But modification is too late, no? > > No, at modification time we still decide whether how to deal with an > aggregate assignment. That can be either pessimistically by storing all > RHS replacement back to its original aggregate, leave the original > assignment in place and then load all LHS replacement from its > aggregate, or optimistically by trying to load LHS replacements either > from RHS replacements or at least from the RHS and storing RHS > replacement directly to LHS, hoping to eliminate the original load. It > is this optimistic approach rather than total scalarization what we need > to disable. In fact, just disabling total scalarization is not enough > if SRA can come across accesses to components from elsewhere in function > body, your patch unfortunately still fails for: > > volatile short vs; > volatile long vl; > > struct A { short s; long i; long j; }; > struct A a, b; > void foo () > { > struct A c; > __builtin_memcpy (&c, &b, sizeof (struct A)); > __builtin_memcpy (&a, &c, sizeof (struct A)); > > vs = c.s; > vl = c.i; > vl = c.j; > } > int main() > { > __builtin_memset (&b, 0, sizeof (struct A)); > b.s = 1; > __builtin_memcpy ((char *)&b+2, &b, 2); > foo (); > __builtin_memcpy (&a, (char *)&a+2, 2); > if (a.s != 1) > __builtin_abort (); > return 0; > } > > >> >>> Thanks a lot for the folding patch, I can take over the SRA bits if you >>> want to. >> >> For reference below is the full patch. >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> Ok for the SRA parts? >> > > My preferred SRA part would be: > > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index db490b20c3e..7a0e4d1ae26 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -1302,6 +1302,17 @@ comes_initialized_p (tree base) > return TREE_CODE (base) == PARM_DECL || constant_decl_p (base); > } > > +/* Return true if REF is a MEM_REF which changes the type of the data it > + accesses. */ > + > +static bool > +type_changing_mem_ref_p (tree ref) > +{ > + return (TREE_CODE (ref) == MEM_REF > + && (TYPE_MAIN_VARIANT (TREE_TYPE (ref)) > + != TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (ref, 0))))); > +} > + > /* Scan expressions occurring in STMT, create access structures for all > accesses > to candidates for scalarization and remove those candidates which occur in > statements or expressions that prevent them from being split apart. > Return > @@ -1338,7 +1349,8 @@ build_accesses_from_assign (gimple *stmt) > { > racc->grp_assignment_read = 1; > if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt) > - && !is_gimple_reg_type (racc->type)) > + && !is_gimple_reg_type (racc->type) > + && !type_changing_mem_ref_p (rhs)) > bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base)); > if (storage_order_barrier_p (lhs)) > racc->grp_unscalarizable_region = 1; > @@ -3589,6 +3601,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator > *gsi) > > if (modify_this_stmt > || gimple_has_volatile_ops (stmt) > + || type_changing_mem_ref_p (lhs) > + || type_changing_mem_ref_p (rhs) > || contains_vce_or_bfcref_p (rhs) > || contains_vce_or_bfcref_p (lhs) > || stmt_ends_bb_p (stmt))
But a type-changing MEM_REF can appear at each ref base. Maybe it's only relevant for toplevel ones though, you should know. So we really need to handle it like we do VIEW_CONVERT_EXPR and/or we need to handle VIEW_CONVERT_EXPRs somewhere in the ref chain the same. So if you want to take over the SRA parts that would be nice. I have to dumb down the memcpy folding a bit as we get too much for the amount of fallout I want to fix right now. Richard. > > I kept the condition in build_accesses_from_assign in order not to do > unnecessary total-scalarization work, but it is those in > sra_modify_assign that actually ensure we keep the assignment intact. > > It still would not ask what Jakub asked for, i.e. keep the old behavior > if there is no padding. But that should be done at the gimple-fold > level too, I believe. > > Thanks, > > Martin