https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529
--- Comment #18 from amker at gcc dot gnu.org --- (In reply to Jim Wilson from comment #17) > I still haven't been able to reproduce this, but I do see a problem. > > In the original bug report, the only difference is that the code uses x4 in > the first part of the diff, and x24 in the second part of the diff, which > seems unimportant. However, this value lives across a call to memcpy. x24 > is a safe register here because it is callee saved. x4 is not safe though, > as it is an argument passing/return value register, which may be clobbered > by a call. Whether it gets clobbered depends on the memcpy implementation > that is linked with. If people are linking with different memcpy > implementations, that might affect whether the bug is reproducible. > > Disassembling my testcase, I don't see the same code sequence though. I see > 401530: d2800802 mov x2, #0x40 // > #64 > 401534: 52800b01 mov w1, #0x58 // > #88 > 401538: aa1303e0 mov x0, x19 > 40153c: 940000d1 bl 401880 <memset> > 401540: 9121c324 add x4, x25, #0x870 > 401544: 91001663 add x3, x19, #0x5 > > which is OK, because the "add x3, x19, #0x5" instruction comes after the > memset call. > > Maybe there is something subtly different about how I'm configuring or > building the toolchain that results in the different LTO optimized code. Hi Jim, I think that's the problem, Wilco also noticed that use of x4 is bogus. It could be a RA bug triggered by this change though. I will double check that later. Thanks.