https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #1)
> If we make integer register store more expensive, this testcase will
> regress:
>
> [hjl@gnu-cfl-1 unroll]$ cat x.i
> void
> foo (long p2, long *diag, long d, long i)
> {
> long k;
> k = p2 < 3 ? p2 + p2 : p2 + 3;
> while (i < k)
> diag[i++] = d;
> }
> [hjl@gnu-cfl-1 unroll]$ make
> /export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/ -O3
> -march=skylake -S x.i
...
> since higher integer register store cost will reduce loop unroll count.
ix86_builtin_vectorization_cost has
case scalar_load:
/* load/store costs are relative to register move which is 2. Recompute
it to COSTS_N_INSNS so everything have same base. */
return COSTS_N_INSNS (fp ? ix86_cost->sse_load[0]
: ix86_cost->int_load [2]) / 2;
case scalar_store:
return COSTS_N_INSNS (fp ? ix86_cost->sse_store[0]
: ix86_cost->int_store [2]) / 2;
sse_load[0], int_load [2], sse_store[0], int_store [2] impact
1. Loop runtime profitability threshold.
2. Selection of memory vs register operands.
Should we add a separate set of costs to processor_costs for vectorizer?