My sincere apologies for not noticing that g++.dg/other/sse2-pr85572-1.C was FAILing with my recent ashrv2di patch. I'm not sure how that happened. Many thanks to Andrew Pinski for alerting me, and confirming that the changes are harmless/beneficial. The following tweak to the testsuite has been committed as obvious. Sorry again for the inconvenience.
Tested on x86_64-pc-linux-gnu with RUNTESTFLAGS="dg.exp=sse2-pr85572-1.C". 2024-08-07 Roger Sayle <ro...@nextmovesoftware.com> gcc/testsuite/ChangeLog * g++.dg/other/sse2-pr85572-1.C: Update expected output after my recent patch for ashrv2di3. Now with one less instruction. > -----Original Message----- > From: Andrew Pinski <pins...@gmail.com> > Sent: 06 August 2024 22:17 > On Mon, Aug 5, 2024 at 3:23 AM Roger Sayle <ro...@nextmovesoftware.com> > wrote: > > > > > > This patch refactors ashrv2di RTL expansion into a function so that it > > may be reused by a pre-reload splitter, such that DImode right shifts > > may be considered candidates during the Scalar-To-Vector (STV) pass. > > Currently DImode arithmetic right shifts are not considered potential > > candidates during STV, so for the following testcase: > > > > long long m; > > typedef long long v2di __attribute__((vector_size (16))); void > > foo(v2di x) { m = x[0]>>63; } > > > > We currently see the following warning/error during STV2 > > > r101 use in insn 7 isn't convertible > > > > And end up generating scalar code with an interunit move: > > > > foo: movq %xmm0, %rax > > sarq $63, %rax > > movq %rax, m(%rip) > > ret > > > > With this patch, we can reuse the RTL expansion logic and produce: > > > > foo: psrad $31, %xmm0 > > pshufd $245, %xmm0, %xmm0 > > movq %xmm0, m(%rip) > > ret > > > > Or with the addition of -mavx2, the equivalent: > > > > foo: vpxor %xmm1, %xmm1, %xmm1 > > vpcmpgtq %xmm0, %xmm1, %xmm0 > > vmovq %xmm0, m(%rip) > > ret > > > > > > The only design decision of note is the choice to continue lowering > > V2DI into vector sequences during RTL expansion, to enable combine to > > optimize things if possible. Using just define_insn_and_split > > potentially misses optimizations, such as reusing the zero vector produced > > by > vpxor above. > > It may be necessary to tweak STV's compute gain at some point, but > > this patch controls what's possible (rather than what's beneficial). > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > Looks like you didn't update the testcase g++.dg/other/sse2-pr85572-1.C . > Before the change GCC produced: > ``` > foo(long long __vector(2)): > movdqa xmm2, xmm0 > pxor xmm1, xmm1 > psrlq xmm2, 63 > psubq xmm1, xmm2 > pxor xmm0, xmm1 > psubq xmm0, xmm1 > ret > ``` > > But afterwards GCC produces (which is better and is now similar to what llvm > produces): > ``` > _Z3fooDv2_x: > movdqa %xmm0, %xmm1 > psrad $31, %xmm1 > pshufd $245, %xmm1, %xmm1 > pxor %xmm1, %xmm0 > psubq %xmm1, %xmm0 > ret > ``` > > Thanks, > Andrew > > > > > 2024-08-05 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > * config/i386/i386-expand.cc (ix86_expand_v2di_ashiftrt): New > > function refactored from define_expand ashrv2di3. > > * config/i386/i386-features.cc > > (general_scalar_to_vector_candidate_p) > > <case ASHIFTRT>: Handle like other shifts and rotates. > > * config/i386/i386-protos.h (ix86_expand_v2di_ashiftrt): Prototype. > > * config/i386/sse.md (ashrv2di3): Call ix86_expand_v2di_ashiftrt. > > (*ashrv2di3): New define_insn_and_split to enable creation by stv2 > > pass, and splitting during split1 reusing ix86_expand_v2di_ashiftrt. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse2-stv-2.c: New test case. > > > > > > Thanks in advance, > > Roger > > -- > >