pr87007-5.c

Hongtao Liu via Gcc-patches Mon, 21 Aug 2023 05:41:04 -0700

On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches
<[email protected]> wrote:
>
> The following fixes the gcc.target/i386/pr87007-5.c testcase which
> changed code generation again after the recent sinking improvements.
> We now have
>
>         vxorps  %xmm0, %xmm0, %xmm0
>         vsqrtsd d2(%rip), %xmm0, %xmm0
>
> and an unnecessary xor again in one case, the other vsqrtsd has
> a register source and a properly zeroing load:
>
>         vmovsd  d3(%rip), %xmm0
>         testl   %esi, %esi
>         jg      .L11
> .L3:
>         vsqrtsd %xmm0, %xmm0, %xmm0
>
> the following patch XFAILs the scan.  I'm not sure what's at
> fault here, there are no loops in the CFG, but somehow
> r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> doesn't.  I guess I don't really understand what
> remove_partial_avx_dependency is supposed to do so can't
> really assess whether the pxor is necessary or not.
There's a false dependency on xmm0 when the source operand in the
pattern is memory, the pattern only takes xmm0 as dest, but the output
instruction takes xmm0 also as input(the second source operand),
that's why we need an pxor here.
When the source operand in the pattern is register_operand, we can
reuse the register_operand for the second source operand. The
instructions here are not very obvious, the more representative one
should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
>
> OK?
Can we add -fno-XXX to disable the optimization to make the assembly
more stable?
Or current codegen should be optimal(for the sinking), then Ok for the patch.


>
>         * gcc.target/i386/pr87007-5.c: Update comment, XFAIL
>         subtest.
> ---
>  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
> b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index a6cdf11522e..5902616d1f1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" 
> } */
> -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
> avoid partial dependence.  */
> +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> +   memory operand will need a xor to avoid partial dependence.  */
>
>  #include<math.h>
>
> @@ -17,4 +19,4 @@ foo (int n, int k)
>
>  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
>  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> --
> 2.35.3



-- 
BR,
Hongtao

Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

Reply via email to