--text follows this line-- On May 24, 2023, Richard Biener <richard.guent...@gmail.com> wrote:
> gimple_fold_builtin_memory_op tries to expand the call to a single > load plus a single store so we can handle overlaps by first loading > everything to registers and then storing: *nod*, that's why I figured we could afford to go back to allowing DImode (with -m32) or TImode (with -m64) even without vector modes: we'd just use a pair of registers, a single insn, even though not a single hardware instruction. > using DImode on i?86 without SSE means we eventually perform two > loads and two stores which means we need two registers available. *nod*. But the alternative is to issue an out-of-line call to memmove, which would clobber more than 2 registers. ISTM that inlining such calls is better, whether optimizing for speed or size. > So I think if we want to expand this further at the GIMPLE level we > should still honor MOVE_MAX but eventually emit multiple loads/stores > honoring the MOVE_MAX_PIECES set of constraints there and avoid > expanding to sequences where we cannot interleave the loads/stores > (aka for the memmove case). But... don't we already? If I'm reading the code right, we'll already issue gimple code to load the whole block into a temporary and then store it, but current MOVE_MAX won't let us go past 4 bytes on SSE-less x86. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>