On Wed, Jun 11, 2025 at 5:02 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > > On 6/10/25 17:05, Richard Biener wrote: > > > > > >> Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>: > >> > >> > >> > >> I had a question asked of me, and now I'm passing the buck. > >> > >> extern void *memcpy(void *, const void *, unsigned int); > >> extern int memcmp(const void *, const void *, unsigned int); > >> typedef unsigned long bits32; > >> typedef unsigned char byte; > >> > >> static const byte orig[10] = { > >> 'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' }; > >> > >> static byte test[10]; > >> > >> int > >> verify (void) > >> { > >> return 0 == memcmp (test, orig, 10 * sizeof (orig[0])); > >> } > >> > >> int > >> benchmark (void) > >> { > >> memcpy (test, orig, 10 * sizeof (orig[0])); > >> return 0; > >> } > >> > >> > >> Target is arm-none-eabi, and when compiled with -Os > >> > >> After the gimple lowering, the verify routine remains the same, but > >> the benchmark () routine is transformed from a memcpy and becomes: > >> > >> > >> ;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718, > >> cgraph_uid=4, symbol_order=3) > >> > >> int benchmark () > >> { > >> int D.4726; > >> > >> MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM > >> <unsigned char[10]> [(char * {ref-all})&orig]; > >> D.4726 = 0; > >> goto <D.4727>; > >> <D.4727>: > >> return D.4726; > >> } > >> > >> > >> It appears that forwprop is then transforming the statement to > >> <bb 2> : > >> MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL"; > >> return 0; > >> > >> And in the final output, there are now 2 copies of the original > >> character data: > >> > >> orig: > >> .ascii "J2OZF50FYL" > >> .space 2 > >> .LC0: > >> .ascii "J2OZF50FYL" > >> .bss > >> > >> > >> and I presume that new string is a copy of the orig text that > >> forwprop has created for some reason. > >> > >> Whats going on, and is there a way to disable this? Either at the > >> lowering stage or in forwprop? At -Os, they are not thrilled that a > >> bunch more redundant text is being generated in the object file. > >> This is a reduced testcase to demonstrate a much larger problem. > >> > > The hope is the static var can be elided and the read might be just a > > small part. In this case heuristics are misfiring I guess. You’d > > have to track down where exactly in folding we are replacing the RHS > > of an aggregate copy. I can’t recall off my head. > > > > Richard > > heres my traceback where the "magic" happens > > #0 fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00, > poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0) at > /gcc/master/gcc/gcc/gimple-fold.cc:9955 > #1 0x0000000001200074 in fold_const_aggregate_ref_1 (t=0x7fffe9f46de8, > valueize=0x0) at /gcc/master/gcc/gcc/gimple-fold.cc:10134 > #2 0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8) at > /gcc/master/gcc/gcc/gimple-fold.cc:10213 > #3 0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8) at > /gcc/master/gcc/gcc/gimple-fold.cc:325 > #4 0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at > /gcc/master/gcc/gcc/gimple-fold.cc:473 > #5 0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410, > inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>, > dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648 > > ctor is a STRING_CST tree and has the string in it : "J2OZF50FYL" > > The fold routine gets to : > > /* We found the field with exact match. */ > if (type > && useless_type_conversion_p (type, TREE_TYPE (ctor)) > && known_eq (poly_offset, 0U)) > return canonicalize_constructor_val (unshare_expr (ctor), from_decl); > > I would hazard a guess that it is the "unshare_expr (ctor)" that is > causing the duplication of the string? I presume we have a good reason > for doing this? Perhaps that is a bad thing at -Os? I don't relally > remember all the unsharing details :-)
unshare_expr doesn't unshare a STRING_CST. But here I think it's fold_stmt_1 that shouldn't replace the RHS with a STRING_CST. We should really have the constant pool represented and have a CONST_DECL for each STRING_CST, but we don't. There might be cases where a STRING_CST is cheaper (for example when it's power-of-two size and less than word_mode as the copy can then be a store from an immediate), but in general a STRING_CST will end up as a new constant pool entry as you've seen. Mind filing a bugreport? Richard. > > From this point, the presumed duplication of the string is returned and > there are no other gates before fold_stmt_1 calls > gimple_assign_set_rhs_from_tree (gsi, new_rhs); > with the newly copied and returned string. > > > I guess an alternate line of questioning is why on x86 we do not turn > the second functions call: > > memcpy (test, orig, 10 * sizeof (orig[0])); > > into > > MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned > char[10]> [(char * {ref-all})&orig]; > > like arm-none-eabi does. It seems that this lowering is triggering the > fold and string duplication. > > Andrew > >