On Mon, Aug 5, 2024 at 3:23 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch refactors ashrv2di RTL expansion into a function so that it may
> be reused by a pre-reload splitter, such that DImode right shifts may be
> considered candidates during the Scalar-To-Vector (STV) pass.  Currently
> DImode arithmetic right shifts are not considered potential candidates
> during STV, so for the following testcase:
>
> long long m;
> typedef long long v2di __attribute__((vector_size (16)));
> void foo(v2di x) { m = x[0]>>63; }
>
> We currently see the following warning/error during STV2
> >  r101 use in insn 7 isn't convertible
>
> And end up generating scalar code with an interunit move:
>
> foo:    movq    %xmm0, %rax
>         sarq    $63, %rax
>         movq    %rax, m(%rip)
>         ret
>
> With this patch, we can reuse the RTL expansion logic and produce:
>
> foo:    psrad   $31, %xmm0
>         pshufd  $245, %xmm0, %xmm0
>         movq    %xmm0, m(%rip)
>         ret
>
> Or with the addition of -mavx2, the equivalent:
>
> foo:    vpxor   %xmm1, %xmm1, %xmm1
>         vpcmpgtq        %xmm0, %xmm1, %xmm0
>         vmovq   %xmm0, m(%rip)
>         ret
>
>
> The only design decision of note is the choice to continue lowering V2DI
> into vector sequences during RTL expansion, to enable combine to optimize
> things if possible.  Using just define_insn_and_split potentially misses
> optimizations, such as reusing the zero vector produced by vpxor above.
> It may be necessary to tweak STV's compute gain at some point, but this
> patch controls what's possible (rather than what's beneficial).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Looks like you didn't update the testcase g++.dg/other/sse2-pr85572-1.C .
Before the change GCC produced:
```
foo(long long __vector(2)):
        movdqa  xmm2, xmm0
        pxor    xmm1, xmm1
        psrlq   xmm2, 63
        psubq   xmm1, xmm2
        pxor    xmm0, xmm1
        psubq   xmm0, xmm1
        ret
```

But afterwards GCC produces (which is better and is now similar to
what llvm produces):
```
_Z3fooDv2_x:
        movdqa  %xmm0, %xmm1
        psrad   $31, %xmm1
        pshufd  $245, %xmm1, %xmm1
        pxor    %xmm1, %xmm0
        psubq   %xmm1, %xmm0
        ret
```

Thanks,
Andrew

>
> 2024-08-05  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_v2di_ashiftrt): New
>         function refactored from define_expand ashrv2di3.
>         * config/i386/i386-features.cc
> (general_scalar_to_vector_candidate_p)
>         <case ASHIFTRT>: Handle like other shifts and rotates.
>         * config/i386/i386-protos.h (ix86_expand_v2di_ashiftrt): Prototype.
>         * config/i386/sse.md (ashrv2di3): Call ix86_expand_v2di_ashiftrt.
>         (*ashrv2di3): New define_insn_and_split to enable creation by stv2
>         pass, and splitting during split1 reusing ix86_expand_v2di_ashiftrt.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse2-stv-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to