On Mon, Aug 21, 2023 at 8:59 PM Richard Biener <rguent...@suse.de> wrote: > > On Mon, 21 Aug 2023, Hongtao Liu wrote: > > > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > > > changed code generation again after the recent sinking improvements. > > > We now have > > > > > > vxorps %xmm0, %xmm0, %xmm0 > > > vsqrtsd d2(%rip), %xmm0, %xmm0 > > > > > > and an unnecessary xor again in one case, the other vsqrtsd has > > > a register source and a properly zeroing load: > > > > > > vmovsd d3(%rip), %xmm0 > > > testl %esi, %esi > > > jg .L11 > > > .L3: > > > vsqrtsd %xmm0, %xmm0, %xmm0 > > > > > > the following patch XFAILs the scan. I'm not sure what's at > > > fault here, there are no loops in the CFG, but somehow > > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) > > > doesn't. I guess I don't really understand what > > > remove_partial_avx_dependency is supposed to do so can't > > > really assess whether the pxor is necessary or not. > > There's a false dependency on xmm0 when the source operand in the > > pattern is memory, the pattern only takes xmm0 as dest, but the output > > instruction takes xmm0 also as input(the second source operand), > > that's why we need an pxor here. > > OK, so XFAIL is wrong, we should instead scan for one xorps then > (like it was in the past). > > > When the source operand in the pattern is register_operand, we can > > reuse the register_operand for the second source operand. The > > instructions here are not very obvious, the more representative one > > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. > > > > > > OK? > > Can we add -fno-XXX to disable the optimization to make the assembly > > more stable? > > Not sure, we could feed GIMPLE IR to RTL expansion instead of > feeding a complex testcase through the pipeline, but I'm not sure > what we were originally supposed to test (the PR trail is a bit > large). > > > Or current codegen should be optimal(for the sinking), then Ok for the > > patch. > > So like the following (I've just adjusted the comments to reflect the > pxor is necessary). > > OK? OK. > > Richard. > > From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001 > From: Richard Biener <rguent...@suse.de> > Date: Mon, 21 Aug 2023 14:09:48 +0200 > Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c > To: gcc-patches@gcc.gnu.org > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > changed code generation again after the recent sinking improvements. > We now have > > vxorps %xmm0, %xmm0, %xmm0 > vsqrtsd d2(%rip), %xmm0, %xmm0 > > and a necessary xor again in one case, the other vsqrtsd has > a register source and a properly zeroing load: > > vmovsd d3(%rip), %xmm0 > testl %esi, %esi > jg .L11 > .L3: > vsqrtsd %xmm0, %xmm0, %xmm0 > > the following patch adjusts the scan. > > * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest. > --- > gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c > b/gcc/testsuite/gcc.target/i386/pr87007-5.c > index a6cdf11522e..8f2dc947f6c 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > @@ -1,6 +1,8 @@ > /* { dg-do compile } */ > /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse > -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" > } */ > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to > avoid partial dependence. */ > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > + are sunk out of the loop and the loop is elided. One vsqrtsd with > + memory operand needs a xor to avoid partial dependence. */ > > #include<math.h> > > @@ -17,4 +19,4 @@ foo (int n, int k) > > /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ > /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > -- > 2.35.3 >
-- BR, Hongtao