On Fri, Nov 24, 2017 at 2:00 PM, Martin Jambor <mjam...@suse.cz> wrote:
> On Fri, Nov 24 2017, Richard Biener wrote:
>> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor <mjam...@suse.cz> wrote:
>>> Hi Richi,
>>>
>>> On Fri, Nov 24 2017, Richard Biener wrote:
>>>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener
>>>
>>> ..
>>>
>>>>>>>> And yes, I've been worried about SRA as well here...  it _does_
>>>>>>>> have some early outs when seeing VIEW_CONVERT_EXPR but
>>>>>>>> appearantly not for the above.  Testcase that aborts with SRA but
>>>>>>>> not without:
>>>>>>>>
>>>>>>>> struct A { short s; long i; long j; };
>>>>>>>> struct A a, b;
>>>>>>>> void foo ()
>>>>>>>> {
>>>>>>>>   struct A c;
>>>>>>>>   __builtin_memcpy (&c, &b, sizeof (struct A));
>>>>>>>>   __builtin_memcpy (&a, &c, sizeof (struct A));
>>>>>>>> }
>>>>>>>> int main()
>>>>>>>> {
>>>>>>>>   __builtin_memset (&b, 0, sizeof (struct A));
>>>>>>>>   b.s = 1;
>>>>>>>>   __builtin_memcpy ((char *)&b+2, &b, 2);
>>>>>>>>   foo ();
>>>>>>>>   __builtin_memcpy (&a, (char *)&a+2, 2);
>>>>>>>>   if (a.s != 1)
>>>>>>>>     __builtin_abort ();
>>>>>>>>   return 0;
>>>>>>>> }
>>>>>>>
>>>>>>> Thanks for the testcase, I agree that is a fairly big problem.  Do you
>>>>>>> think that the following (untested) patch is an appropriate way of
>>>>>>> fixing it and generally of extending gimple to capture that a statement
>>>>>>> is a bit-copy?
>>>>>>
>>>>>> I think the place to fix is the memcpy folding.  That is, we'd say that
>>>>>> aggregate assignments are not bit-copies but do element-wise assignments.
>>>>>> For memcpy folding we'd then need to use a type that doesn't contain
>>>>>> padding.  Which effectively means char[].
>>>>>>
>>>>>> Of course we need to stop SRA from decomposing that copy to
>>>>>> individual characters then ;)
>>>>>>
>>>>>> So iff we decide that all aggregate copies are element copies,
>>>>>> maybe only those where TYPE_MAIN_VARIANT of lhs and rhs match
>>>>>> (currently we allow TYPE_CANONICAL compatibility and thus there
>>>>>> might be some mismatches), then we have to fix nothign but
>>>>>> the memcpy folding.
>>>>>>
>>>>>>> If so, I'll add the testcase, bootstrap it and formally propose it.
>>>>>>> Subsequently I will of course make sure that any element-wise copying
>>>>>>> patch would test the predicate.
>>>>>>
>>>>>> I don't think the alias-set should determine whether a copy is
>>>>>> bit-wise or not.
>>>>>
>>>>> Like the attached.  At least FAILs
>>>>>
>>>>> FAIL: gcc.dg/tree-ssa/ssa-ccp-27.c scan-tree-dump-times ccp1
>>>>> "memcpy[^\n]*123456" 2 (found 0 times)
>>>>>
>>>>> not sure why we have this test.
>>>>
>>>> Hum.  And SRA still decomposes the copy to struct elements w/o padding
>>>> even though the access is done using char[].  So somehow it ignores
>>>> VIEW_CONVERT_EXPRs
>>>> (well, those implicitely present on MEM_REFs).
>>>
>>> Yes.  SRA is not even too afraid of top-level V_C_Es.  It really bails
>>> out only if they are buried under a handled_component.  And it does not
>>> remove aggregate assignments containing them.
>>>
>>>>
>>>> Looks like this is because total scalarization is done on the decls
>>>> and not at all
>>>> honoring how the variable is accessed?  The following seems to fix
>>>> that, otherwise untested.
>>>
>>>
>>>>
>>>> Index: gcc/tree-sra.c
>>>> ===================================================================
>>>> --- gcc/tree-sra.c      (revision 255137)
>>>> +++ gcc/tree-sra.c      (working copy)
>>>> @@ -1338,7 +1338,9 @@ build_accesses_from_assign (gimple *stmt
>>>>      {
>>>>        racc->grp_assignment_read = 1;
>>>>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
>>>> -         && !is_gimple_reg_type (racc->type))
>>>> +         && !is_gimple_reg_type (racc->type)
>>>> +         && (TYPE_MAIN_VARIANT (racc->type)
>>>> +             == TYPE_MAIN_VARIANT (TREE_TYPE (racc->base))))
>>>>         bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID 
>>>> (racc->base));
>>>>        if (storage_order_barrier_p (lhs))
>>>>         racc->grp_unscalarizable_region = 1;
>>>
>>> I believe that the added condition is not what you want, this seems
>>> to trigger also for ordinary:
>>>
>>>   s1 = s2.field
>>>
>>> Where racc->type is type of the field but racc->base is s2 and its type
>>> is type of the structure.
>>
>> Yes.  But do we want to totally scalarize s2 in this case?  We only
>> access parts of it.  We don't seem to have a testcase that fails
>> (well, full testing still in progress).
>
> If we start with
>
> small_s1 = src;
> dst = small_s1->even_smaller_struct_field;
>
> and small_s1 is otherwise unused, I think that we want to facilitate
> copy propagation with total scalarization too.
>
>>
>>> I also think you want to be setting a bit in
>>> cannot_scalarize_away_bitmap in order to guarantee that total
>>> scalarization will not happen for the given candidate.  Otherwise some
>>> other regular assignment might trigger it ...
>>
>> Yeah, figured that out myself.
>>
>>> except if we then also
>>> checked the statement for bit-copying types in sra_modify_assign (in the
>>> condition after the big comment), which I suppose is actually the
>>> correct thing to do.
>>
>> But modification is too late, no?
>
> No, at modification time we still decide whether how to deal with an
> aggregate assignment.  That can be either pessimistically by storing all
> RHS replacement back to its original aggregate, leave the original
> assignment in place and then load all LHS replacement from its
> aggregate, or optimistically by trying to load LHS replacements either
> from RHS replacements or at least from the RHS and storing RHS
> replacement directly to LHS, hoping to eliminate the original load.  It
> is this optimistic approach rather than total scalarization what we need
> to disable.  In fact, just disabling total scalarization is not enough
> if SRA can come across accesses to components from elsewhere in function
> body, your patch unfortunately still fails for:
>
> volatile short vs;
> volatile long vl;
>
> struct A { short s; long i; long j; };
> struct A a, b;
> void foo ()
> {
>   struct A c;
>   __builtin_memcpy (&c, &b, sizeof (struct A));
>   __builtin_memcpy (&a, &c, sizeof (struct A));
>
>   vs = c.s;
>   vl = c.i;
>   vl = c.j;
> }
> int main()
> {
>   __builtin_memset (&b, 0, sizeof (struct A));
>   b.s = 1;
>   __builtin_memcpy ((char *)&b+2, &b, 2);
>   foo ();
>   __builtin_memcpy (&a, (char *)&a+2, 2);
>   if (a.s != 1)
>     __builtin_abort ();
>   return 0;
> }
>
>
>>
>>> Thanks a lot for the folding patch, I can take over the SRA bits if you
>>> want to.
>>
>> For reference below is the full patch.
>>
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>
>> Ok for the SRA parts?
>>
>
> My preferred SRA part would be:
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index db490b20c3e..7a0e4d1ae26 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1302,6 +1302,17 @@ comes_initialized_p (tree base)
>    return TREE_CODE (base) == PARM_DECL || constant_decl_p (base);
>  }
>
> +/* Return true if REF is a MEM_REF which changes the type of the data it
> +   accesses.  */
> +
> +static bool
> +type_changing_mem_ref_p (tree ref)
> +{
> +  return (TREE_CODE (ref) == MEM_REF
> +         && (TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> +             != TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (ref, 0)))));
> +}
> +
>  /* Scan expressions occurring in STMT, create access structures for all 
> accesses
>     to candidates for scalarization and remove those candidates which occur in
>     statements or expressions that prevent them from being split apart.  
> Return
> @@ -1338,7 +1349,8 @@ build_accesses_from_assign (gimple *stmt)
>      {
>        racc->grp_assignment_read = 1;
>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
> -         && !is_gimple_reg_type (racc->type))
> +         && !is_gimple_reg_type (racc->type)
> +         && !type_changing_mem_ref_p (rhs))
>         bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>        if (storage_order_barrier_p (lhs))
>         racc->grp_unscalarizable_region = 1;
> @@ -3589,6 +3601,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>
>    if (modify_this_stmt
>        || gimple_has_volatile_ops (stmt)
> +      || type_changing_mem_ref_p (lhs)
> +      || type_changing_mem_ref_p (rhs)
>        || contains_vce_or_bfcref_p (rhs)
>        || contains_vce_or_bfcref_p (lhs)
>        || stmt_ends_bb_p (stmt))

But a type-changing MEM_REF can appear at each ref base.  Maybe it's
only relevant
for toplevel ones though, you should know.

So we really need to handle it like we do VIEW_CONVERT_EXPR and/or we need
to handle VIEW_CONVERT_EXPRs somewhere in the ref chain the same.

So if you want to take over the SRA parts that would be nice.

I have to dumb down the memcpy folding a bit as we get too much for the amount
of fallout I want to fix right now.

Richard.

>
> I kept the condition in build_accesses_from_assign in order not to do
> unnecessary total-scalarization work, but it is those in
> sra_modify_assign that actually ensure we keep the assignment intact.
>
> It still would not ask what Jakub asked for, i.e. keep the old behavior
> if there is no padding.  But that should be done at the gimple-fold
> level too, I believe.
>
> Thanks,
>
> Martin

Reply via email to