https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945
--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> --- (In reply to Jan Schultke from comment #8) > From what I could read in the `char_traits::move` code that presumably gets > called, this function explicitly tests for overlap between the memory > regions, and dispatches to cheap functions if possible. The input size was 8 > MiB, so it is unlikely that the overhead from this overlap detection is > contributing in any relevant capacity. I think you're reading it wrong. The overlap detection in char_traits::move is only for constant evaluation, because we can't use memmove. The overlap detection that matters here is in _M_replace, long before we use char_traits::move. > Basically, due to this overlap testing, `assign` SHOULD be just as fast as > other methods if there is no overlap (and in this case, there clearly is > none). However, it is 14x slower, so something is off. > > Either I haven't followed the logic correctly, or there is a mistake in this > dispatching logic which leads to much worse performance for .assign(). Or the optimizers don't optimize away all the checks in _M_replace and so we don't unroll everything to a simple memmove, but do all the runtime checks every time. Which is what I think is happening.