On Thu, May 25, 2023 at 3:25 PM Alexandre Oliva <ol...@adacore.com> wrote:
>
> On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote:
>
> > On Thu, May 25, 2023 at 1:10 PM Alexandre Oliva <ol...@adacore.com> wrote:
> >>
> >> On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote:
> >>
> >> > I mean we could do what RTL expansion would do later and do
> >> > by-pieces, thus emit multiple loads/stores but not n loads and then
> >> > n stores but interleaved.
> >>
> >> That wouldn't help e.g. gcc.dg/memcpy-6.c's fold_move_8, because
> >> MOVE_MAX and MOVE_MAX_PIECES currently limits inline expansion to 4
> >> bytes on x86 without SSE, both in gimple and RTL, and interleaved loads
> >> and stores wouldn't help with memmove.  We can't fix that by changing
> >> code that uses MOVE_MAX and/or MOVE_MAX_PIECES, when these limits are
> >> set too low.
>
> > Btw, there was a short period where the MOVE_MAX limit was restricted
> > but that had fallout and we've reverted since then.
>
> Erhm...  Are we even talking about the same issue?
>
> i386/i386.h reduced the 32-bit non-SSE MOVE_MAX from 16 to 4, which
> broke this test; I'm proposing to bounce it back up to 8, so that we get
> a little more memmove inlining, enough for tests that expect that much
> to pass.
>
> You may be focusing on the gimple-fold bit, because I mentioned it, but
> even the rtl expander is failing to expand the memmove because of the
> setting, as evidenced by the test's failure in the scan for memmove in
> the final dump.

So indeed fold_move_8 expands to the following, even with -minline-all-stringops

fold_move_8:
.LFB5:
        .cfi_startproc
        pushl   %ebp
        .cfi_def_cfa_offset 8
        .cfi_offset 5, -8
        movl    %esp, %ebp
        .cfi_def_cfa_register 5
        subl    $8, %esp
        movl    $a+3, %eax
        subl    $4, %esp
        pushl   $8
        pushl   $a
        pushl   %eax
        call    memmove
        addl    $16, %esp
        nop

I do think it's still up to RTL expansion or the target to decide whether
its worth spending two registers to handle the overlap or maybe
emit a compare & jump to do forward and backward variants.

Yes, increasing MOVE_MAX to 8 makes this expand at the GIMPLE
level already, which I belive is premature and difficult to undo.

> That MOVE_MAX change was a significant regression in codegen for 32-bit
> non-SSE x86, and I'm proposing to fix that.  Compensating for that
> regression elsewhere doesn't seem desirable to me: MOVE_MAX can be much
> higher even on other x86 variants, so the effects of such attempts may
> harm quite significantly more modern CPUs.
>
> Conversely, I don't expect the reduction of MOVE_MAX on SSE-less x86 a
> couple of years ago to have been measured for performance effects, given
> the little overall relevance of such CPUs, and the very visible and
> undesirable effects on codegen that change brought onto them.  And yet,
> I'm being very conservative in the proposed reversion, because
> benchmarking such targets in any meaningful way would be somewhat
> challenging for myself as well.
>
> So, could we please have this narrow fix of this limited regression at
> the spot where it was introduced accepted, rather than debating
> tangents?
>
> --
> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>    Free Software Activist                       GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>

Reply via email to