https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101197
--- Comment #13 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to cqwrteur from comment #12) > (In reply to cqwrteur from comment #11) > > (In reply to Tamar Christina from comment #10) > > > (In reply to cqwrteur from comment #9) > > > > (In reply to Tamar Christina from comment #8) > > > > > (In reply to Jakub Jelinek from comment #6) > > > > > > Shouldn't that be a different PR with details? I mean, this PR is > > > > > > that we > > > > > > should expand shorter memmove inline even if the regions do overlap. > > > > > > > > > > Sure, I'm still trying to create a minimal representative example > > > > > (it's C++ > > > > > and templated) unless just pointing at the github is enough. > > > > > > > > > > To be clear though, just inlining memmove at all will cover most of > > > > > the > > > > > distance, it's just that you require less registers. > > > > > > > > inline things like memcpy and memmove will lead to serious binary > > > > bloat. The > > > > compiler usually picks to emit call to libc's memcpy and memmove that is > > > > usually highly optimized with assembly code. > > > > > > Yes your binary will grow, but on small memcopy and memmove. the calling > > > overhead, not to mention the register allocation overhead you might get > > > from > > > having to spill your caller saves more than makes up for it. > > > > > > We already inline memcpy and memset. there's no reason not to do memmove, > > > especially at -O3. > > > > That is false. inline memcpy and memset only works when the size is > > constant. > > more for type punning reason. > but on small memcopy and memmove.(In reply to cqwrteur from comment #11) > (In reply to Tamar Christina from comment #10) > > (In reply to cqwrteur from comment #9) > > > (In reply to Tamar Christina from comment #8) > > > > (In reply to Jakub Jelinek from comment #6) > > > > > Shouldn't that be a different PR with details? I mean, this PR is > > > > > that we > > > > > should expand shorter memmove inline even if the regions do overlap. > > > > > > > > Sure, I'm still trying to create a minimal representative example (it's > > > > C++ > > > > and templated) unless just pointing at the github is enough. > > > > > > > > To be clear though, just inlining memmove at all will cover most of the > > > > distance, it's just that you require less registers. > > > > > > inline things like memcpy and memmove will lead to serious binary bloat. > > > The > > > compiler usually picks to emit call to libc's memcpy and memmove that is > > > usually highly optimized with assembly code. > > > > Yes your binary will grow, but on small memcopy and memmove. the calling > > overhead, not to mention the register allocation overhead you might get from > > having to spill your caller saves more than makes up for it. > > > > We already inline memcpy and memset. there's no reason not to do memmove, > > especially at -O3. > > That is false. inline memcpy and memset only works when the size is constant. How do you think you know when the size is small? > but on small memcopy and memmove. By logic this means you know the size is constant.