On 6/10/25 17:05, Richard Biener wrote:


Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>:



I had a question asked of me, and now I'm passing the buck.

    extern void *memcpy(void *, const void *, unsigned int);
    extern int memcmp(const void *, const void *, unsigned int);
    typedef unsigned long bits32;
    typedef unsigned char byte;

    static const byte orig[10] = {
       'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' };

    static byte test[10];

    int
    verify (void)
    {
      return 0 == memcmp (test, orig, 10 * sizeof (orig[0]));
    }

    int
    benchmark (void)
    {
      memcpy (test, orig, 10 * sizeof (orig[0]));
      return 0;
    }


Target is arm-none-eabi, and when compiled with -Os

After the gimple lowering, the verify routine remains the same, but the benchmark () routine  is transformed from a memcpy and becomes:


    ;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718,
    cgraph_uid=4, symbol_order=3)

    int benchmark ()
    {
      int D.4726;

      MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM
    <unsigned char[10]> [(char * {ref-all})&orig];
      D.4726 = 0;
      goto <D.4727>;
      <D.4727>:
      return D.4726;
    }


It appears that forwprop is then transforming the statement to
  <bb 2> :
  MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL";
  return 0;

And in the final output, there are now 2 copies of the original character data:

orig:
        .ascii  "J2OZF50FYL"
        .space  2
.LC0:
        .ascii  "J2OZF50FYL"
        .bss


and I presume that new string is a copy of the orig text that forwprop has created for some reason.

Whats going on, and is there a way to disable this?  Either at the lowering stage or in forwprop?   At -Os, they are not thrilled that a bunch more redundant text is being generated in the object file.   This is a reduced testcase to demonstrate a much larger problem.

The hope is the static var can be elided and the read might be just a small part.  In this case heuristics are misfiring I guess.  You’d have to track down where exactly in folding we are replacing the RHS of an aggregate copy.  I can’t recall off my head.

Richard

heres my traceback where the "magic" happens

#0  fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00, poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0) at /gcc/master/gcc/gcc/gimple-fold.cc:9955 #1  0x0000000001200074 in fold_const_aggregate_ref_1 (t=0x7fffe9f46de8, valueize=0x0) at /gcc/master/gcc/gcc/gimple-fold.cc:10134 #2  0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8) at /gcc/master/gcc/gcc/gimple-fold.cc:10213 #3  0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8) at /gcc/master/gcc/gcc/gimple-fold.cc:325 #4  0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at /gcc/master/gcc/gcc/gimple-fold.cc:473 #5  0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410, inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>, dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648

ctor  is a STRING_CST tree and has  the string in it  : "J2OZF50FYL"

The fold routine gets to  :

  /* We found the field with exact match.  */
  if (type
      && useless_type_conversion_p (type, TREE_TYPE (ctor))
      && known_eq (poly_offset, 0U))
    return canonicalize_constructor_val (unshare_expr (ctor), from_decl);

I would hazard a guess that it is the "unshare_expr (ctor)"  that is causing the duplication of the string?  I presume we have a good reason for doing this?   Perhaps that is a bad thing at -Os? I don't relally remember all the unsharing details :-)

From this point, the presumed duplication of the string is returned and there are no other gates before fold_stmt_1 calls
  gimple_assign_set_rhs_from_tree (gsi, new_rhs);
with the newly copied and returned string.


I guess an alternate line of questioning is why on x86 we do not turn the second functions call:

memcpy (test, orig, 10 * sizeof (orig[0]));

into

MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned char[10]> [(char * {ref-all})&orig];

like arm-none-eabi does.  It seems that this lowering is triggering the fold and string duplication.

Andrew


Reply via email to