On Thu, Apr 20, 2023 at 8:46 AM liuhongt <hongtao....@intel.com> wrote: > > 1547 /* If this insn loads a parameter from its stack slot, then it > 1548 represents a savings, rather than a cost, if the parameter is > 1549 stored in memory. Record this fact. > 1550 > 1551 Similarly if we're loading other constants from memory (constant > 1552 pool, TOC references, small data areas, etc) and this is the only > 1553 assignment to the destination pseudo. > > At that time, preferred regclass is unknown, and GENERAL_REGS is used to > record memory move cost, but it's not accurate especially for large vector > modes, i.e. 512-bit vector in x86 which would most probably allocate with > SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate > the cost of this load and make RA propagate the memeory operand into many > consume instructions which causes worse performance. > > Fortunately, NO_REGS is used to record the best scenario, so the patch uses > NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > and aarch64-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > PR rtl-optimization/108707 > * ira-costs.cc (scan_one_insn): Use NO_REGS instead of > GENERAL_REGS when preferred reg_class is not known. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr108707.c: New test. > --- > gcc/ira-costs.cc | 5 ++++- > gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++ > 2 files changed, 20 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c > > diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc > index c0fdef807dd..d2a801ab9b0 100644 > --- a/gcc/ira-costs.cc > +++ b/gcc/ira-costs.cc > @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) > && (! ira_use_lra_p || ! pic_offset_table_rtx > || ! contains_symbol_ref_p (XEXP (note, 0)))) > { > - enum reg_class cl = GENERAL_REGS; > + /* Costs for NO_REGS are used in cost calculation on the > + 1st pass when the preferred register classes are not > + known yet. In this case we take the best scenario. */ > + enum reg_class cl = NO_REGS; > rtx reg = SET_DEST (set); > int num = COST_INDEX (REGNO (reg)); > > diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c > b/gcc/testsuite/gcc.target/i386/pr108707.c > new file mode 100644 > index 00000000000..bc1a476f551 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr108707.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */ > +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ > + > +#include<immintrin.h> > + > +void > +foo (__m512 pv, __m512 a, __m512 b, __m512 c, > + __m512* pdest, __m512* p1) > +{ > + __m512 t = *p1; > + pdest[0] = _mm512_fmadd_ps (t, pv, a); > + pdest[1] = _mm512_fmadd_ps (t, pv, b); > + pdest[2] = _mm512_fmadd_ps (t, pv, c); > +} > -- > 2.39.1.388.g2fc9e9ca3c >
-- BR, Hongtao