My sincere apologies for not noticing that g++.dg/other/sse2-pr85572-1.C
was FAILing with my recent ashrv2di patch.  I'm not sure how that happened.
Many thanks to Andrew Pinski for alerting me, and confirming that the
changes are harmless/beneficial.  The following tweak to the testsuite
has been committed as obvious.  Sorry again for the inconvenience.

Tested on x86_64-pc-linux-gnu with RUNTESTFLAGS="dg.exp=sse2-pr85572-1.C".


2024-08-07  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/testsuite/ChangeLog
        * g++.dg/other/sse2-pr85572-1.C: Update expected output after
        my recent patch for ashrv2di3.  Now with one less instruction.


> -----Original Message-----
> From: Andrew Pinski <pins...@gmail.com>
> Sent: 06 August 2024 22:17
> On Mon, Aug 5, 2024 at 3:23 AM Roger Sayle <ro...@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch refactors ashrv2di RTL expansion into a function so that it
> > may be reused by a pre-reload splitter, such that DImode right shifts
> > may be considered candidates during the Scalar-To-Vector (STV) pass.
> > Currently DImode arithmetic right shifts are not considered potential
> > candidates during STV, so for the following testcase:
> >
> > long long m;
> > typedef long long v2di __attribute__((vector_size (16))); void
> > foo(v2di x) { m = x[0]>>63; }
> >
> > We currently see the following warning/error during STV2
> > >  r101 use in insn 7 isn't convertible
> >
> > And end up generating scalar code with an interunit move:
> >
> > foo:    movq    %xmm0, %rax
> >         sarq    $63, %rax
> >         movq    %rax, m(%rip)
> >         ret
> >
> > With this patch, we can reuse the RTL expansion logic and produce:
> >
> > foo:    psrad   $31, %xmm0
> >         pshufd  $245, %xmm0, %xmm0
> >         movq    %xmm0, m(%rip)
> >         ret
> >
> > Or with the addition of -mavx2, the equivalent:
> >
> > foo:    vpxor   %xmm1, %xmm1, %xmm1
> >         vpcmpgtq        %xmm0, %xmm1, %xmm0
> >         vmovq   %xmm0, m(%rip)
> >         ret
> >
> >
> > The only design decision of note is the choice to continue lowering
> > V2DI into vector sequences during RTL expansion, to enable combine to
> > optimize things if possible.  Using just define_insn_and_split
> > potentially misses optimizations, such as reusing the zero vector produced 
> > by
> vpxor above.
> > It may be necessary to tweak STV's compute gain at some point, but
> > this patch controls what's possible (rather than what's beneficial).
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> Looks like you didn't update the testcase g++.dg/other/sse2-pr85572-1.C .
> Before the change GCC produced:
> ```
> foo(long long __vector(2)):
>         movdqa  xmm2, xmm0
>         pxor    xmm1, xmm1
>         psrlq   xmm2, 63
>         psubq   xmm1, xmm2
>         pxor    xmm0, xmm1
>         psubq   xmm0, xmm1
>         ret
> ```
> 
> But afterwards GCC produces (which is better and is now similar to what llvm
> produces):
> ```
> _Z3fooDv2_x:
>         movdqa  %xmm0, %xmm1
>         psrad   $31, %xmm1
>         pshufd  $245, %xmm1, %xmm1
>         pxor    %xmm1, %xmm0
>         psubq   %xmm1, %xmm0
>         ret
> ```
> 
> Thanks,
> Andrew
> 
> >
> > 2024-08-05  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-expand.cc (ix86_expand_v2di_ashiftrt): New
> >         function refactored from define_expand ashrv2di3.
> >         * config/i386/i386-features.cc
> > (general_scalar_to_vector_candidate_p)
> >         <case ASHIFTRT>: Handle like other shifts and rotates.
> >         * config/i386/i386-protos.h (ix86_expand_v2di_ashiftrt): Prototype.
> >         * config/i386/sse.md (ashrv2di3): Call ix86_expand_v2di_ashiftrt.
> >         (*ashrv2di3): New define_insn_and_split to enable creation by stv2
> >         pass, and splitting during split1 reusing ix86_expand_v2di_ashiftrt.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/sse2-stv-2.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

Reply via email to