https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498

            Bug ID: 91498
           Summary: [10 Regression] STV change in r274481 causes 300.twolf
                    regression on Haswell
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Split out from PR91154 comment#25

Biggest changes when benchmarking -mno-stv (base) against -mstv (peak):

   7.28%         37789  twolf_peak.none  twolf_peak.none   [.] ucxx2 
   4.21%         25709  twolf_base.none  twolf_base.none   [.] ucxx2        
   3.72%         22584  twolf_base.none  twolf_base.none   [.] new_dbox
   2.48%         22364  twolf_peak.none  twolf_peak.none   [.] new_dbox
   1.49%          8270  twolf_base.none  twolf_base.none   [.] sub_penal
   1.12%          7576  twolf_peak.none  twolf_peak.none   [.] sub_penal
   1.36%          9314  twolf_peak.none  twolf_peak.none   [.] old_assgnto_new2
   1.11%          5257  twolf_base.none  twolf_base.none   [.] old_assgnto_new2

and in ucxx2 I see

  0.17 │       mov    %eax,(%rsp)
  3.55 │       vpmins (%rsp),%xmm0,%xmm1   
       │       test   %eax,%eax
  0.22 │       vmovd  %xmm1,%r8d              
  0.80 │       cmovs  %esi,%r8d

...

Testcase:

extern int numBins;
extern int binOffst;
extern int binWidth;
extern int Trybin;
void foo (int);

void bar (int aleft, int axcenter)
{
  int a1LoBin = (((Trybin=((axcenter + aleft)-binOffst)/binWidth)<0)
                 ? 0 : ((Trybin>numBins) ? numBins : Trybin));
  foo (a1LoBin);
}

where combine eliminates the reg-reg copies STV adds to split live-ranges
between GPR and SSE uses (currently one plain move and one set via
vec_merge/duplicate).

Making STV of SI/DImode chains always happen after combine (in STV2) fixes
the testcase above but regresses gcc.target/i386/minmax-6.c which ran into
a very similar issue and was reduced from a SPEC CPU 2006 regression observed.

In the end the issue is that as soon as the RA decides it needs to spill
for a dual-use pseudo it ends up going through the stack because of, as
HJ notices, the minimum cost of moves between SSE and integer units is 8:

  /* Moves between SSE and integer units are expensive.  */
  if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))

    /* ??? By keeping returned value relatively high, we limit the number
       of moves between integer and SSE registers for all targets.
       Additionally, high value prevents problem with x86_modes_tieable_p(),
       where integer modes in SSE registers are not tieable
       because of missing QImode and HImode moves to, from or between
       MMX/SSE registers.  */
    return MAX (8, SSE_CLASS_P (class1)
                ? ix86_cost->hard_register.sse_to_integer
                : ix86_cost->hard_register.integer_to_sse);

Reply via email to