Hi! On Mon, Nov 04, 2019 at 03:16:06PM +0800, Kewen.Lin wrote: > To align with rs6000_insn_cost costing more for load type insns,
(Which itself has history in rs6000_rtx_costs). > this patch is to make load insns cost more in vectorization cost > function. Considering that the result of load usually is used > somehow later (true-dep) but store won't, we keep the store as > before. The latency of load insns is about twice that of "simple" instructions; 2 vs. 1 on older cores, and 4 (or so) vs. 2 on newer cores. > The SPEC2017 performance evaluation on Power8 shows 525.x264_r > +9.56%, 511.povray_r +2.08%, 527.cam4_r 1.16% gains, no > significant degradation, SPECINT geomean +0.88%, SPECFP geomean > +0.26%. Nice :-) > The SPEC2017 performance evaluation on Power9 shows no significant > improvement or degradation, SPECINT geomean +0.04%, SPECFP geomean > +0.04%. > > The SPEC2006 performance evaluation on Power8 shows 454.calculix > +4.41% gain but 416.gamess -1.19% and 453.povray -3.83% degradation. > I looked into the two degradation bmks, the degradation were NOT > due to hotspot changes by vectorization, were all side effects. > SPECINT geomean +0.10%, SPECFP geomean no changed considering > the degradation. Also nice. > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4763,15 +4763,22 @@ rs6000_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > switch (type_of_cost) > { > case scalar_stmt: > - case scalar_load: > case scalar_store: > case vector_stmt: > - case vector_load: > case vector_store: > case vec_to_scalar: > case scalar_to_vec: > case cond_branch_not_taken: > return 1; > + case scalar_load: > + case vector_load: > + /* Like rs6000_insn_cost, make load insns cost a bit more. FIXME: the (two spaces after full stop). > + benefits were observed on Power8 and up, we can unify it if similar > + profits are measured on Power6 and Power7. */ > + if (TARGET_P8_VECTOR) > + return 2; > + else > + return 1; Hrm, but you showed benchmark improvements for p9 as well? What happens if you enable this for everything as well? > - return 2; > + /* Like rs6000_insn_cost, make load insns cost a bit more. FIXME: the > + benefits were observed on Power8 and up, we can unify it if similar > + profits are measured on Power6 and Power7. */ > + if (TARGET_P8_VECTOR) > + return 4; > + else > + return 2; And this. Segher