On Wed, Jun 11, 2025, 9:24 AM Andrew MacLeod <amacl...@redhat.com> wrote:

>
> On 6/11/25 11:02, Andrew MacLeod wrote:
> >
> > On 6/10/25 17:05, Richard Biener wrote:
> >>
> >>
> >>> Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>:
> >>>
> >>> 
> >>>
> >>> I had a question asked of me, and now I'm passing the buck.
> >>>
> >>>     extern void *memcpy(void *, const void *, unsigned int);
> >>>     extern int memcmp(const void *, const void *, unsigned int);
> >>>     typedef unsigned long bits32;
> >>>     typedef unsigned char byte;
> >>>
> >>>     static const byte orig[10] = {
> >>>        'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' };
> >>>
> >>>     static byte test[10];
> >>>
> >>>     int
> >>>     verify (void)
> >>>     {
> >>>       return 0 == memcmp (test, orig, 10 * sizeof (orig[0]));
> >>>     }
> >>>
> >>>     int
> >>>     benchmark (void)
> >>>     {
> >>>       memcpy (test, orig, 10 * sizeof (orig[0]));
> >>>       return 0;
> >>>     }
> >>>
> >>>
> >>> Target is arm-none-eabi, and when compiled with -Os
> >>>
> >>> After the gimple lowering, the verify routine remains the same, but
> >>> the benchmark () routine  is transformed from a memcpy and becomes:
> >>>
> >>>
> >>>     ;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718,
> >>>     cgraph_uid=4, symbol_order=3)
> >>>
> >>>     int benchmark ()
> >>>     {
> >>>       int D.4726;
> >>>
> >>>       MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM
> >>>     <unsigned char[10]> [(char * {ref-all})&orig];
> >>>       D.4726 = 0;
> >>>       goto <D.4727>;
> >>>       <D.4727>:
> >>>       return D.4726;
> >>>     }
> >>>
> >>>
> >>> It appears that forwprop is then transforming the statement to
> >>>   <bb 2> :
> >>>   MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL";
> >>>   return 0;
> >>>
> >>> And in the final output, there are now 2 copies of the original
> >>> character data:
> >>>
> >>> orig:
> >>>         .ascii  "J2OZF50FYL"
> >>>         .space  2
> >>> .LC0:
> >>>         .ascii  "J2OZF50FYL"
> >>>         .bss
> >>>
> >>>
> >>> and I presume that new string is a copy of the orig text that
> >>> forwprop has created for some reason.
> >>>
> >>> Whats going on, and is there a way to disable this?  Either at the
> >>> lowering stage or in forwprop?   At -Os, they are not thrilled that
> >>> a bunch more redundant text is being generated in the object file.
> >>> This is a reduced testcase to demonstrate a much larger problem.
> >>>
> >> The hope is the static var can be elided and the read might be just a
> >> small part.  In this case heuristics are misfiring I guess.  You’d
> >> have to track down where exactly in folding we are replacing the RHS
> >> of an aggregate copy.  I can’t recall off my head.
> >>
> >> Richard
> >
> > heres my traceback where the "magic" happens
> >
> > #0  fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00,
> > poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0)
> > at /gcc/master/gcc/gcc/gimple-fold.cc:9955
> > #1  0x0000000001200074 in fold_const_aggregate_ref_1
> > (t=0x7fffe9f46de8, valueize=0x0) at
> > /gcc/master/gcc/gcc/gimple-fold.cc:10134
> > #2  0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8)
> > at /gcc/master/gcc/gcc/gimple-fold.cc:10213
> > #3  0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8)
> > at /gcc/master/gcc/gcc/gimple-fold.cc:325
> > #4  0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at
> > /gcc/master/gcc/gcc/gimple-fold.cc:473
> > #5  0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410,
> > inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>,
> > dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648
> >
> > ctor  is a STRING_CST tree and has  the string in it  : "J2OZF50FYL"
> >
> > The fold routine gets to  :
> >
> >   /* We found the field with exact match.  */
> >   if (type
> >       && useless_type_conversion_p (type, TREE_TYPE (ctor))
> >       && known_eq (poly_offset, 0U))
> >     return canonicalize_constructor_val (unshare_expr (ctor), from_decl);
> >
> > I would hazard a guess that it is the "unshare_expr (ctor)"  that is
> > causing the duplication of the string?  I presume we have a good
> > reason for doing this?   Perhaps that is a bad thing at -Os? I don't
> > relally remember all the unsharing details :-)
> >
> > From this point, the presumed duplication of the string is returned
> > and there are no other gates before fold_stmt_1 calls
> >   gimple_assign_set_rhs_from_tree (gsi, new_rhs);
> > with the newly copied and returned string.
> >
> >
> > I guess an alternate line of questioning is why on x86 we do not turn
> > the second functions call:
> >
> > memcpy (test, orig, 10 * sizeof (orig[0]));
> >
> > into
> >
> > MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned
> > char[10]> [(char * {ref-all})&orig];
> >
> > like arm-none-eabi does.  It seems that this lowering is triggering
> > the fold and string duplication.
> >
> > Andrew
> >
> The difference in lowering between my x86 build and the arm build is in
> gimple_fold_call:
>
>    if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>      {
>        if (gimple_fold_builtin (gsi))
>          changed = true;
>      }
>
> on x86, it fails the   gimple_call_builtin_p() call, whereas on arm it
> does not, and proceeds to fold the builtin
>
> the decl for memcpy on my x86 box has a built_in_class of NOT_BUILT_IN,
> whereas on the arm build, it is set to BUILT_IN_NORMAL, which then
> proceeds to do the fold.
>
> Where is that determined?  I don't see much in the config directories or
> other obvious places as to where it is decided that it is a builtin
> function or not?
>

Most likely this is due to last/size argument to memcpy. For x86_64-linux,
it would be unsigned long.

Thanks,
Andrew



> Andrew
>
>

Reply via email to