On Wed, Jun 11, 2025 at 5:02 PM Andrew MacLeod <amacl...@redhat.com> wrote:
>
>
> On 6/10/25 17:05, Richard Biener wrote:
> >
> >
> >> Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>:
> >>
> >> 
> >>
> >> I had a question asked of me, and now I'm passing the buck.
> >>
> >>     extern void *memcpy(void *, const void *, unsigned int);
> >>     extern int memcmp(const void *, const void *, unsigned int);
> >>     typedef unsigned long bits32;
> >>     typedef unsigned char byte;
> >>
> >>     static const byte orig[10] = {
> >>        'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' };
> >>
> >>     static byte test[10];
> >>
> >>     int
> >>     verify (void)
> >>     {
> >>       return 0 == memcmp (test, orig, 10 * sizeof (orig[0]));
> >>     }
> >>
> >>     int
> >>     benchmark (void)
> >>     {
> >>       memcpy (test, orig, 10 * sizeof (orig[0]));
> >>       return 0;
> >>     }
> >>
> >>
> >> Target is arm-none-eabi, and when compiled with -Os
> >>
> >> After the gimple lowering, the verify routine remains the same, but
> >> the benchmark () routine  is transformed from a memcpy and becomes:
> >>
> >>
> >>     ;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718,
> >>     cgraph_uid=4, symbol_order=3)
> >>
> >>     int benchmark ()
> >>     {
> >>       int D.4726;
> >>
> >>       MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM
> >>     <unsigned char[10]> [(char * {ref-all})&orig];
> >>       D.4726 = 0;
> >>       goto <D.4727>;
> >>       <D.4727>:
> >>       return D.4726;
> >>     }
> >>
> >>
> >> It appears that forwprop is then transforming the statement to
> >>   <bb 2> :
> >>   MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL";
> >>   return 0;
> >>
> >> And in the final output, there are now 2 copies of the original
> >> character data:
> >>
> >> orig:
> >>         .ascii  "J2OZF50FYL"
> >>         .space  2
> >> .LC0:
> >>         .ascii  "J2OZF50FYL"
> >>         .bss
> >>
> >>
> >> and I presume that new string is a copy of the orig text that
> >> forwprop has created for some reason.
> >>
> >> Whats going on, and is there a way to disable this?  Either at the
> >> lowering stage or in forwprop?   At -Os, they are not thrilled that a
> >> bunch more redundant text is being generated in the object file.
> >> This is a reduced testcase to demonstrate a much larger problem.
> >>
> > The hope is the static var can be elided and the read might be just a
> > small part.  In this case heuristics are misfiring I guess.  You’d
> > have to track down where exactly in folding we are replacing the RHS
> > of an aggregate copy.  I can’t recall off my head.
> >
> > Richard
>
> heres my traceback where the "magic" happens
>
> #0  fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00,
> poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0) at
> /gcc/master/gcc/gcc/gimple-fold.cc:9955
> #1  0x0000000001200074 in fold_const_aggregate_ref_1 (t=0x7fffe9f46de8,
> valueize=0x0) at /gcc/master/gcc/gcc/gimple-fold.cc:10134
> #2  0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8) at
> /gcc/master/gcc/gcc/gimple-fold.cc:10213
> #3  0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8) at
> /gcc/master/gcc/gcc/gimple-fold.cc:325
> #4  0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at
> /gcc/master/gcc/gcc/gimple-fold.cc:473
> #5  0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410,
> inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>,
> dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648
>
> ctor  is a STRING_CST tree and has  the string in it  : "J2OZF50FYL"
>
> The fold routine gets to  :
>
>    /* We found the field with exact match.  */
>    if (type
>        && useless_type_conversion_p (type, TREE_TYPE (ctor))
>        && known_eq (poly_offset, 0U))
>      return canonicalize_constructor_val (unshare_expr (ctor), from_decl);
>
> I would hazard a guess that it is the "unshare_expr (ctor)"  that is
> causing the duplication of the string?  I presume we have a good reason
> for doing this?   Perhaps that is a bad thing at -Os? I don't relally
> remember all the unsharing details :-)

unshare_expr doesn't unshare a STRING_CST.  But here I think it's
fold_stmt_1 that shouldn't replace the RHS with a STRING_CST.
We should really have the constant pool represented and have a
CONST_DECL for each STRING_CST, but we don't.  There might
be cases where a STRING_CST is cheaper (for example when
it's power-of-two size and less than word_mode as the copy can
then be a store from an immediate), but in general a STRING_CST
will end up as a new constant pool entry as you've seen.

Mind filing a bugreport?

Richard.

>
>  From this point, the presumed duplication of the string is returned and
> there are no other gates before fold_stmt_1 calls
>    gimple_assign_set_rhs_from_tree (gsi, new_rhs);
> with the newly copied and returned string.
>
>
> I guess an alternate line of questioning is why on x86 we do not turn
> the second functions call:
>
> memcpy (test, orig, 10 * sizeof (orig[0]));
>
> into
>
> MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned
> char[10]> [(char * {ref-all})&orig];
>
> like arm-none-eabi does.  It seems that this lowering is triggering the
> fold and string duplication.
>
> Andrew
>
>

Reply via email to