https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878

--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #1)
> If we make integer register store more expensive, this testcase will
> regress:
> 
> [hjl@gnu-cfl-1 unroll]$ cat x.i
> void
> foo (long p2, long *diag, long d, long i)
> {
>   long k;
>   k = p2 < 3 ? p2 + p2 : p2 + 3;
>   while (i < k)
>     diag[i++] = d;
> }
> [hjl@gnu-cfl-1 unroll]$ make
> /export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/tools-build/gcc-wip-debug/build-x86_64-linux/gcc/ -O3
> -march=skylake  -S x.i
...
> since higher integer register store cost will reduce loop unroll count.

ix86_builtin_vectorization_cost has

     case scalar_load:
        /* load/store costs are relative to register move which is 2. Recompute
           it to COSTS_N_INSNS so everything have same base.  */
        return COSTS_N_INSNS (fp ? ix86_cost->sse_load[0]
                              : ix86_cost->int_load [2]) / 2;

      case scalar_store:
        return COSTS_N_INSNS (fp ? ix86_cost->sse_store[0]
                              : ix86_cost->int_store [2]) / 2;

sse_load[0], int_load [2], sse_store[0], int_store [2] impact

1. Loop runtime profitability threshold.
2. Selection of memory vs register operands.

Should we add a separate set of costs to processor_costs for vectorizer?

Reply via email to