https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jan Schultke from comment #8)
> From what I could read in the `char_traits::move` code that presumably gets
> called, this function explicitly tests for overlap between the memory
> regions, and dispatches to cheap functions if possible. The input size was 8
> MiB, so it is unlikely that the overhead from this overlap detection is
> contributing in any relevant capacity.

I think you're reading it wrong. The overlap detection in char_traits::move is
only for constant evaluation, because we can't use memmove.

The overlap detection that matters here is in _M_replace, long before we use
char_traits::move.

> Basically, due to this overlap testing, `assign` SHOULD be just as fast as
> other methods if there is no overlap (and in this case, there clearly is
> none). However, it is 14x slower, so something is off.
> 
> Either I haven't followed the logic correctly, or there is a mistake in this
> dispatching logic which leads to much worse performance for .assign().

Or the optimizers don't optimize away all the checks in _M_replace and so we
don't unroll everything to a simple memmove, but do all the runtime checks
every time. Which is what I think is happening.

Reply via email to