https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498
Bug ID: 91498 Summary: [10 Regression] STV change in r274481 causes 300.twolf regression on Haswell Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Split out from PR91154 comment#25 Biggest changes when benchmarking -mno-stv (base) against -mstv (peak): 7.28% 37789 twolf_peak.none twolf_peak.none [.] ucxx2 4.21% 25709 twolf_base.none twolf_base.none [.] ucxx2 3.72% 22584 twolf_base.none twolf_base.none [.] new_dbox 2.48% 22364 twolf_peak.none twolf_peak.none [.] new_dbox 1.49% 8270 twolf_base.none twolf_base.none [.] sub_penal 1.12% 7576 twolf_peak.none twolf_peak.none [.] sub_penal 1.36% 9314 twolf_peak.none twolf_peak.none [.] old_assgnto_new2 1.11% 5257 twolf_base.none twolf_base.none [.] old_assgnto_new2 and in ucxx2 I see 0.17 │ mov %eax,(%rsp) 3.55 │ vpmins (%rsp),%xmm0,%xmm1 │ test %eax,%eax 0.22 │ vmovd %xmm1,%r8d 0.80 │ cmovs %esi,%r8d ... Testcase: extern int numBins; extern int binOffst; extern int binWidth; extern int Trybin; void foo (int); void bar (int aleft, int axcenter) { int a1LoBin = (((Trybin=((axcenter + aleft)-binOffst)/binWidth)<0) ? 0 : ((Trybin>numBins) ? numBins : Trybin)); foo (a1LoBin); } where combine eliminates the reg-reg copies STV adds to split live-ranges between GPR and SSE uses (currently one plain move and one set via vec_merge/duplicate). Making STV of SI/DImode chains always happen after combine (in STV2) fixes the testcase above but regresses gcc.target/i386/minmax-6.c which ran into a very similar issue and was reduced from a SPEC CPU 2006 regression observed. In the end the issue is that as soon as the RA decides it needs to spill for a dual-use pseudo it ends up going through the stack because of, as HJ notices, the minimum cost of moves between SSE and integer units is 8: /* Moves between SSE and integer units are expensive. */ if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2)) /* ??? By keeping returned value relatively high, we limit the number of moves between integer and SSE registers for all targets. Additionally, high value prevents problem with x86_modes_tieable_p(), where integer modes in SSE registers are not tieable because of missing QImode and HImode moves to, from or between MMX/SSE registers. */ return MAX (8, SSE_CLASS_P (class1) ? ix86_cost->hard_register.sse_to_integer : ix86_cost->hard_register.integer_to_sse);