On Thu, May 25, 2023 at 12:01 PM Alexandre Oliva <ol...@adacore.com> wrote: > > --text follows this line-- > On May 24, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > > > gimple_fold_builtin_memory_op tries to expand the call to a single > > load plus a single store so we can handle overlaps by first loading > > everything to registers and then storing: > > *nod*, that's why I figured we could afford to go back to allowing > DImode (with -m32) or TImode (with -m64) even without vector modes: we'd > just use a pair of registers, a single insn, even though not a single > hardware instruction. > > > using DImode on i?86 without SSE means we eventually perform two > > loads and two stores which means we need two registers available. > > *nod*. But the alternative is to issue an out-of-line call to memmove, > which would clobber more than 2 registers. ISTM that inlining such > calls is better, whether optimizing for speed or size. > > > So I think if we want to expand this further at the GIMPLE level we > > should still honor MOVE_MAX but eventually emit multiple loads/stores > > honoring the MOVE_MAX_PIECES set of constraints there and avoid > > expanding to sequences where we cannot interleave the loads/stores > > (aka for the memmove case). > > But... don't we already? If I'm reading the code right, we'll already > issue gimple code to load the whole block into a temporary and then > store it, but current MOVE_MAX won't let us go past 4 bytes on SSE-less > x86.
I mean we could do what RTL expansion would do later and do by-pieces, thus emit multiple loads/stores but not n loads and then n stores but interleaved. Richard. > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > Disinformation flourishes because many people care deeply about injustice > but very few check the facts. Ask me about <https://stallmansupport.org>