https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101197
--- Comment #12 from cqwrteur <unlvsur at live dot com> --- (In reply to cqwrteur from comment #11) > (In reply to Tamar Christina from comment #10) > > (In reply to cqwrteur from comment #9) > > > (In reply to Tamar Christina from comment #8) > > > > (In reply to Jakub Jelinek from comment #6) > > > > > Shouldn't that be a different PR with details? I mean, this PR is > > > > > that we > > > > > should expand shorter memmove inline even if the regions do overlap. > > > > > > > > Sure, I'm still trying to create a minimal representative example (it's > > > > C++ > > > > and templated) unless just pointing at the github is enough. > > > > > > > > To be clear though, just inlining memmove at all will cover most of the > > > > distance, it's just that you require less registers. > > > > > > inline things like memcpy and memmove will lead to serious binary bloat. > > > The > > > compiler usually picks to emit call to libc's memcpy and memmove that is > > > usually highly optimized with assembly code. > > > > Yes your binary will grow, but on small memcopy and memmove. the calling > > overhead, not to mention the register allocation overhead you might get from > > having to spill your caller saves more than makes up for it. > > > > We already inline memcpy and memset. there's no reason not to do memmove, > > especially at -O3. > > That is false. inline memcpy and memset only works when the size is constant. more for type punning reason.