https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to H.J. Lu from comment #1) > If we make integer register store more expensive, this testcase will > regress: > > [hjl@gnu-cfl-1 unroll]$ cat x.i > void > foo (long p2, long *diag, long d, long i) > { > long k; > k = p2 < 3 ? p2 + p2 : p2 + 3; > while (i < k) > diag[i++] = d; > } > [hjl@gnu-cfl-1 unroll]$ make > /export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/xgcc > -B/export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/ -O3 > -march=skylake -S x.i ... > since higher integer register store cost will reduce loop unroll count. ix86_builtin_vectorization_cost has case scalar_load: /* load/store costs are relative to register move which is 2. Recompute it to COSTS_N_INSNS so everything have same base. */ return COSTS_N_INSNS (fp ? ix86_cost->sse_load[0] : ix86_cost->int_load [2]) / 2; case scalar_store: return COSTS_N_INSNS (fp ? ix86_cost->sse_store[0] : ix86_cost->int_store [2]) / 2; sse_load[0], int_load [2], sse_store[0], int_store [2] impact 1. Loop runtime profitability threshold. 2. Selection of memory vs register operands. Should we add a separate set of costs to processor_costs for vectorizer?