On Thu, May 25, 2023 at 12:01 PM Alexandre Oliva <ol...@adacore.com> wrote:
>
> --text follows this line--
> On May 24, 2023, Richard Biener <richard.guent...@gmail.com> wrote:
>
> > gimple_fold_builtin_memory_op tries to expand the call to a single
> > load plus a single store so we can handle overlaps by first loading
> > everything to registers and then storing:
>
> *nod*, that's why I figured we could afford to go back to allowing
> DImode (with -m32) or TImode (with -m64) even without vector modes: we'd
> just use a pair of registers, a single insn, even though not a single
> hardware instruction.
>
> > using DImode on i?86 without SSE means we eventually perform two
> > loads and two stores which means we need two registers available.
>
> *nod*.  But the alternative is to issue an out-of-line call to memmove,
> which would clobber more than 2 registers.  ISTM that inlining such
> calls is better, whether optimizing for speed or size.
>
> > So I think if we want to expand this further at the GIMPLE level we
> > should still honor MOVE_MAX but eventually emit multiple loads/stores
> > honoring the MOVE_MAX_PIECES set of constraints there and avoid
> > expanding to sequences where we cannot interleave the loads/stores
> > (aka for the memmove case).
>
> But...  don't we already?  If I'm reading the code right, we'll already
> issue gimple code to load the whole block into a temporary and then
> store it, but current MOVE_MAX won't let us go past 4 bytes on SSE-less
> x86.

I mean we could do what RTL expansion would do later and do
by-pieces, thus emit multiple loads/stores but not n loads and then
n stores but interleaved.

Richard.

>
> --
> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>    Free Software Activist                       GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>

Reply via email to