Hi!

On Mon, Nov 04, 2019 at 03:16:06PM +0800, Kewen.Lin wrote:
> To align with rs6000_insn_cost costing more for load type insns,

(Which itself has history in rs6000_rtx_costs).

> this patch is to make load insns cost more in vectorization cost
> function.  Considering that the result of load usually is used
> somehow later (true-dep) but store won't, we keep the store as
> before.

The latency of load insns is about twice that of "simple" instructions;
2 vs. 1 on older cores, and 4 (or so) vs. 2 on newer cores.

> The SPEC2017 performance evaluation on Power8 shows 525.x264_r
> +9.56%, 511.povray_r +2.08%, 527.cam4_r 1.16% gains, no 
> significant degradation, SPECINT geomean +0.88%, SPECFP geomean
> +0.26%.

Nice :-)

> The SPEC2017 performance evaluation on Power9 shows no significant
> improvement or degradation, SPECINT geomean +0.04%, SPECFP geomean
> +0.04%.
> 
> The SPEC2006 performance evaluation on Power8 shows 454.calculix
> +4.41% gain but 416.gamess -1.19% and 453.povray -3.83% degradation.
> I looked into the two degradation bmks, the degradation were NOT
> due to hotspot changes by vectorization, were all side effects.
> SPECINT geomean +0.10%, SPECFP geomean no changed considering
> the degradation.

Also nice.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4763,15 +4763,22 @@ rs6000_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>    switch (type_of_cost)
>      {
>        case scalar_stmt:
> -      case scalar_load:
>        case scalar_store:
>        case vector_stmt:
> -      case vector_load:
>        case vector_store:
>        case vec_to_scalar:
>        case scalar_to_vec:
>        case cond_branch_not_taken:
>          return 1;
> +      case scalar_load:
> +      case vector_load:
> +     /* Like rs6000_insn_cost, make load insns cost a bit more. FIXME: the

(two spaces after full stop).

> +        benefits were observed on Power8 and up, we can unify it if similar
> +        profits are measured on Power6 and Power7.  */
> +     if (TARGET_P8_VECTOR)
> +       return 2;
> +     else
> +       return 1;

Hrm, but you showed benchmark improvements for p9 as well?

What happens if you enable this for everything as well?

> -        return 2;
> +     /* Like rs6000_insn_cost, make load insns cost a bit more. FIXME: the
> +        benefits were observed on Power8 and up, we can unify it if similar
> +        profits are measured on Power6 and Power7.  */
> +     if (TARGET_P8_VECTOR)
> +       return 4;
> +     else
> +       return 2;

And this.


Segher

Reply via email to