https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90952
Bug ID: 90952 Summary: Costs of moves are used for costs of RTL expressions Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: hubicka at ucw dot cz, skpgkp1 at gmail dot com, ubizjak at gmail dot com Target Milestone: --- Target: i386,x86-64 This patch: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html includes: diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index e943d13..8409a5f 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = { {4, 4, 4}, /* cost of loading integer registers in QImode, HImode and SImode. Relative to reg-reg move (2). */ - {6, 6, 6}, /* cost of storing integer registers */ + {6, 6, 3}, /* cost of storing integer registers */ 2, /* cost of reg,reg fld/fst */ {6, 6, 8}, /* cost of loading fp registers in SFmode, DFmode and XFmode */ It lowered the cost for SImode store and made it cheaper than SSE<->integer register move. It caused a regression: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878 Since the cost for SImode store is also used to compute costs of scalar_store RTL expression in ix86_builtin_vectorization_cost, it changed loop costs in void foo (long p2, long *diag, long d, long i) { long k; k = p2 < 3 ? p2 + p2 : p2 + 3; while (i < k) diag[i++] = d; } As the result, the loop is unrolled 4 times with -O3 -march=skylake, instead of 3.