On Thu, May 25, 2023 at 3:25 PM Alexandre Oliva <ol...@adacore.com> wrote: > > On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > > > On Thu, May 25, 2023 at 1:10 PM Alexandre Oliva <ol...@adacore.com> wrote: > >> > >> On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > >> > >> > I mean we could do what RTL expansion would do later and do > >> > by-pieces, thus emit multiple loads/stores but not n loads and then > >> > n stores but interleaved. > >> > >> That wouldn't help e.g. gcc.dg/memcpy-6.c's fold_move_8, because > >> MOVE_MAX and MOVE_MAX_PIECES currently limits inline expansion to 4 > >> bytes on x86 without SSE, both in gimple and RTL, and interleaved loads > >> and stores wouldn't help with memmove. We can't fix that by changing > >> code that uses MOVE_MAX and/or MOVE_MAX_PIECES, when these limits are > >> set too low. > > > Btw, there was a short period where the MOVE_MAX limit was restricted > > but that had fallout and we've reverted since then. > > Erhm... Are we even talking about the same issue? > > i386/i386.h reduced the 32-bit non-SSE MOVE_MAX from 16 to 4, which > broke this test; I'm proposing to bounce it back up to 8, so that we get > a little more memmove inlining, enough for tests that expect that much > to pass. > > You may be focusing on the gimple-fold bit, because I mentioned it, but > even the rtl expander is failing to expand the memmove because of the > setting, as evidenced by the test's failure in the scan for memmove in > the final dump.
So indeed fold_move_8 expands to the following, even with -minline-all-stringops fold_move_8: .LFB5: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 subl $8, %esp movl $a+3, %eax subl $4, %esp pushl $8 pushl $a pushl %eax call memmove addl $16, %esp nop I do think it's still up to RTL expansion or the target to decide whether its worth spending two registers to handle the overlap or maybe emit a compare & jump to do forward and backward variants. Yes, increasing MOVE_MAX to 8 makes this expand at the GIMPLE level already, which I belive is premature and difficult to undo. > That MOVE_MAX change was a significant regression in codegen for 32-bit > non-SSE x86, and I'm proposing to fix that. Compensating for that > regression elsewhere doesn't seem desirable to me: MOVE_MAX can be much > higher even on other x86 variants, so the effects of such attempts may > harm quite significantly more modern CPUs. > > Conversely, I don't expect the reduction of MOVE_MAX on SSE-less x86 a > couple of years ago to have been measured for performance effects, given > the little overall relevance of such CPUs, and the very visible and > undesirable effects on codegen that change brought onto them. And yet, > I'm being very conservative in the proposed reversion, because > benchmarking such targets in any meaningful way would be somewhat > challenging for myself as well. > > So, could we please have this narrow fix of this limited regression at > the spot where it was introduced accepted, rather than debating > tangents? > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > Disinformation flourishes because many people care deeply about injustice > but very few check the facts. Ask me about <https://stallmansupport.org>