--text follows this line--
On May 24, 2023, Richard Biener <richard.guent...@gmail.com> wrote:

> gimple_fold_builtin_memory_op tries to expand the call to a single
> load plus a single store so we can handle overlaps by first loading
> everything to registers and then storing:

*nod*, that's why I figured we could afford to go back to allowing
DImode (with -m32) or TImode (with -m64) even without vector modes: we'd
just use a pair of registers, a single insn, even though not a single
hardware instruction.

> using DImode on i?86 without SSE means we eventually perform two
> loads and two stores which means we need two registers available.

*nod*.  But the alternative is to issue an out-of-line call to memmove,
which would clobber more than 2 registers.  ISTM that inlining such
calls is better, whether optimizing for speed or size.

> So I think if we want to expand this further at the GIMPLE level we
> should still honor MOVE_MAX but eventually emit multiple loads/stores
> honoring the MOVE_MAX_PIECES set of constraints there and avoid
> expanding to sequences where we cannot interleave the loads/stores
> (aka for the memmove case).

But...  don't we already?  If I'm reading the code right, we'll already
issue gimple code to load the whole block into a temporary and then
store it, but current MOVE_MAX won't let us go past 4 bytes on SSE-less
x86.

-- 
Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
   Free Software Activist                       GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>

Reply via email to