https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101197
--- Comment #11 from cqwrteur <unlvsur at live dot com> --- (In reply to Tamar Christina from comment #10) > (In reply to cqwrteur from comment #9) > > (In reply to Tamar Christina from comment #8) > > > (In reply to Jakub Jelinek from comment #6) > > > > Shouldn't that be a different PR with details? I mean, this PR is that > > > > we > > > > should expand shorter memmove inline even if the regions do overlap. > > > > > > Sure, I'm still trying to create a minimal representative example (it's > > > C++ > > > and templated) unless just pointing at the github is enough. > > > > > > To be clear though, just inlining memmove at all will cover most of the > > > distance, it's just that you require less registers. > > > > inline things like memcpy and memmove will lead to serious binary bloat. The > > compiler usually picks to emit call to libc's memcpy and memmove that is > > usually highly optimized with assembly code. > > Yes your binary will grow, but on small memcopy and memmove. the calling > overhead, not to mention the register allocation overhead you might get from > having to spill your caller saves more than makes up for it. > > We already inline memcpy and memset. there's no reason not to do memmove, > especially at -O3. That is false. inline memcpy and memset only works when the size is constant.