https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #6 from Vineet Gupta <vgupta at synopsys dot com> --- (In reply to Linus Torvalds from comment #4) > (In reply to Andrew Pinski from comment #1) > > The loop gets vectorized, I don't see the problem really. > > > See > > > https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/372 > > and in particular the comment > > "In the first 8-byte copy, src and dst overlap" > > so apparently gcc has decided that they can't overlap, despite the two > pointers being literally generated from the same base pointer. Exactly: > But I don't real arc assembly, so I'll have to take Vineet's word for it. fwiw: LDD.a [base, off] is 8-byte load with pre-incr : eff addr = base + offset STD.ab [base, off] is 8-byte store with post-incr: eff addr = base > Vineet, have you been able to generate a smaller test-case? No I'm afraid not.