On Thu, Apr 20, 2023 at 8:46 AM liuhongt <hongtao....@intel.com> wrote:
>
> 1547  /* If this insn loads a parameter from its stack slot, then it
> 1548     represents a savings, rather than a cost, if the parameter is
> 1549     stored in memory.  Record this fact.
> 1550
> 1551     Similarly if we're loading other constants from memory (constant
> 1552     pool, TOC references, small data areas, etc) and this is the only
> 1553     assignment to the destination pseudo.
>
> At that time, preferred regclass is unknown, and GENERAL_REGS is used to
> record memory move cost, but it's not accurate especially for large vector
> modes, i.e. 512-bit vector in x86 which would most probably allocate with
> SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate
> the cost of this load and make RA propagate the memeory operand into many
> consume instructions which causes worse performance.
>
> Fortunately, NO_REGS is used to record the best scenario, so the patch uses
> NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> and aarch64-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/108707
>         * ira-costs.cc (scan_one_insn): Use NO_REGS instead of
>         GENERAL_REGS when preferred reg_class is not known.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr108707.c: New test.
> ---
>  gcc/ira-costs.cc                         |  5 ++++-
>  gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c
>
> diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
> index c0fdef807dd..d2a801ab9b0 100644
> --- a/gcc/ira-costs.cc
> +++ b/gcc/ira-costs.cc
> @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn)
>        && (! ira_use_lra_p || ! pic_offset_table_rtx
>           || ! contains_symbol_ref_p (XEXP (note, 0))))
>      {
> -      enum reg_class cl = GENERAL_REGS;
> +      /* Costs for NO_REGS are used in cost calculation on the
> +        1st pass when the preferred register classes are not
> +        known yet.  In this case we take the best scenario.  */
> +      enum reg_class cl = NO_REGS;
>        rtx reg = SET_DEST (set);
>        int num = COST_INDEX (REGNO (reg));
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c 
> b/gcc/testsuite/gcc.target/i386/pr108707.c
> new file mode 100644
> index 00000000000..bc1a476f551
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr108707.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */
> +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */
> +
> +#include<immintrin.h>
> +
> +void
> +foo (__m512 pv, __m512 a, __m512 b, __m512 c,
> +     __m512* pdest, __m512* p1)
> +{
> +  __m512 t = *p1;
> +    pdest[0] = _mm512_fmadd_ps (t, pv, a);
> +    pdest[1] = _mm512_fmadd_ps (t, pv, b);
> +    pdest[2] = _mm512_fmadd_ps (t, pv, c);
> +}
> --
> 2.39.1.388.g2fc9e9ca3c
>


-- 
BR,
Hongtao

Reply via email to