On Wed, 23 Apr 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Wednesday, April 23, 2025 9:37 AM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Sandiford > > <richard.sandif...@arm.com> > > Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > > > On Wed, 23 Apr 2025, Richard Biener wrote: > > > > > On Wed, 23 Apr 2025, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > This patch proposes a new vector cost model called "max". The cost > > > > model is > > an > > > > intersection between two of our existing cost models. Like `unlimited` > > > > it > > > > disables the costing vs scalar and assumes all vectorization to be > > > > profitable. > > > > > > > > But unlike unlimited it does not fully disable the vector cost model. > > > > That > > > > means that we still perform comparisons between vector modes. > > > > > > > > As an example, the following: > > > > > > > > void > > > > foo (char *restrict a, int *restrict b, int *restrict c, > > > > int *restrict d, int stride) > > > > { > > > > if (stride <= 1) > > > > return; > > > > > > > > for (int i = 0; i < 3; i++) > > > > { > > > > int res = c[i]; > > > > int t = b[i * stride]; > > > > if (a[i] != 0) > > > > res = t * d[i]; > > > > c[i] = res; > > > > } > > > > } > > > > > > > > compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to > > > > vectorize as it assumes scalar would be faster, and with > > > > -fvect-cost-model=unlimited it picks a vector type that's so big that > > > > the large > > > > sequence generated is working on mostly inactive lanes: > > > > > > > > ... > > > > and p3.b, p3/z, p4.b, p4.b > > > > whilelo p0.s, wzr, w7 > > > > ld1w z23.s, p3/z, [x3, #3, mul vl] > > > > ld1w z28.s, p0/z, [x5, z31.s, sxtw 2] > > > > add x0, x5, x0 > > > > punpklo p6.h, p6.b > > > > ld1w z27.s, p4/z, [x0, z31.s, sxtw 2] > > > > and p6.b, p6/z, p0.b, p0.b > > > > punpklo p4.h, p7.b > > > > ld1w z24.s, p6/z, [x3, #2, mul vl] > > > > and p4.b, p4/z, p2.b, p2.b > > > > uqdecw w6 > > > > ld1w z26.s, p4/z, [x3] > > > > whilelo p1.s, wzr, w6 > > > > mul z27.s, p5/m, z27.s, z23.s > > > > ld1w z29.s, p1/z, [x4, z31.s, sxtw 2] > > > > punpkhi p7.h, p7.b > > > > mul z24.s, p5/m, z24.s, z28.s > > > > and p7.b, p7/z, p1.b, p1.b > > > > mul z26.s, p5/m, z26.s, z30.s > > > > ld1w z25.s, p7/z, [x3, #1, mul vl] > > > > st1w z27.s, p3, [x2, #3, mul vl] > > > > mul z25.s, p5/m, z25.s, z29.s > > > > st1w z24.s, p6, [x2, #2, mul vl] > > > > st1w z25.s, p7, [x2, #1, mul vl] > > > > st1w z26.s, p4, [x2] > > > > ... > > > > > > > > With -fvect-cost-model=max you get more reasonable code: > > > > > > > > foo: > > > > cmp w4, 1 > > > > ble .L1 > > > > ptrue p7.s, vl3 > > > > index z0.s, #0, w4 > > > > ld1b z29.s, p7/z, [x0] > > > > ld1w z30.s, p7/z, [x1, z0.s, sxtw 2] > > > > ptrue p6.b, all > > > > cmpne p7.b, p7/z, z29.b, #0 > > > > ld1w z31.s, p7/z, [x3] > > > > mul z31.s, p6/m, z31.s, z30.s > > > > st1w z31.s, p7, [x2] > > > > .L1: > > > > ret > > > > > > > > This model has been useful internally for performance exploration and > > > > cost- > > model > > > > validation. It allows us to force realistic vectorization overriding > > > > the cost > > > > model to be able to tell whether it's correct wrt to profitability. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues. > > > > > > > > Ok for master? > > > > > > Hmm. I don't like another cost model. Instead how about changing > > > 'unlimited' to still iterate through vector sizes? Cost modeling > > > is really about vector vs. scalar, not vector vs. vector which is > > > completely under target control. Targets should provide a way > > > to limit iteration, like aarch64 has with the aarch64-autovec-preference > > > --param or x86 has with -mprefer-vector-width. > > > > > > Of course changing 'unlimited' might result in somewhat of a testsuite > > > churn, but still the fix there would be to inject a proper -mXYZ > > > or --param to get the old behavior back (or even consider cycling > > > through the different aarch64-autovec-preference settings for the > > > testsuite). > > > > Note this will completely remove the ability to reject never profitable > > vectorizations, so I'm not sure that this is what you'd want in practice. > > You want to fix cost modeling instead. > > > > So why does it consider the scalar code to be faster with =dynamic > > and why do you think that's not possible to fix? Don't we have > > per-loop #pragma control to force vectorization here (but maybe that > > has the 'unlimited' cost modeling issue)? > > > > The addition wasn't for the GCC testsuite usage specifically. This is about > testing real world code wrt to our cost models. In these instances it's not > feasible to sprinkle pragmas over every loop in every program.
Sure, but still cost modeling should be fixed then - with using unlimited (or max) you'd still have to sprinkle novector on the loops that will be slower otherwise. > We also use this during uarch design validation, as e.g. it gives someone > working on a CPU the ability to generate vector code for design purposes > regardless of what the compiler thinks is profitable on current designs. For the latter I believe the target should provide ways to force a specific mode with =unlimited then, otherwise you can't reliably get all variants anyway but would depend on costing to pick the correct one out of a set of enabled modes. Richard. > Thanks, > Tamar > > > Richard. > > > > > Richard. > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * common.opt (vect-cost-model, simd-cost-model): Add max cost > > > > model. > > > > * doc/invoke.texi: Document it. > > > > * flag-types.h (enum vect_cost_model): Add VECT_COST_MODEL_MAX. > > > > * tree-vect-data-refs.cc (vect_peeling_hash_insert, > > > > vect_peeling_hash_choose_best_peeling, > > > > vect_enhance_data_refs_alignment): Use it. > > > > * tree-vect-loop.cc (vect_analyze_loop_costing, > > > > vect_estimate_min_profitable_iters): Likewise. > > > > > > > > --- > > > > diff --git a/gcc/common.opt b/gcc/common.opt > > > > index > > 88d987e6ab14d9f8df7aa686efffc43418dbb42d..bd5e2e951f9388b12206d9ad > > dc736e336cd0e4ee 100644 > > > > --- a/gcc/common.opt > > > > +++ b/gcc/common.opt > > > > @@ -3442,11 +3442,11 @@ Enable basic block vectorization (SLP) on trees. > > > > > > > > fvect-cost-model= > > > > Common Joined RejectNegative Enum(vect_cost_model) > > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) Optimization > > > > --fvect-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the > > > > cost > > model for vectorization. > > > > +-fvect-cost-model=[unlimited|max|dynamic|cheap|very-cheap] > > > > Specifies > > the cost model for vectorization. > > > > > > > > fsimd-cost-model= > > > > Common Joined RejectNegative Enum(vect_cost_model) > > Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) Optimization > > > > --fsimd-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies > > the vectorization cost model for code marked with a simd directive. > > > > +-fsimd-cost-model=[unlimited|max|dynamic|cheap|very-cheap] > > > > Specifies > > the vectorization cost model for code marked with a simd directive. > > > > > > > > Enum > > > > Name(vect_cost_model) Type(enum vect_cost_model) > > UnknownError(unknown vectorizer cost model %qs) > > > > @@ -3454,6 +3454,9 @@ Name(vect_cost_model) Type(enum > > vect_cost_model) UnknownError(unknown vectorizer > > > > EnumValue > > > > Enum(vect_cost_model) String(unlimited) > > Value(VECT_COST_MODEL_UNLIMITED) > > > > > > > > +EnumValue > > > > +Enum(vect_cost_model) String(max) Value(VECT_COST_MODEL_MAX) > > > > + > > > > EnumValue > > > > Enum(vect_cost_model) String(dynamic) > > Value(VECT_COST_MODEL_DYNAMIC) > > > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > > > index > > 14a78fd236f64185fc129f18b52b20692d49305c..e7b242c9134ff17022c92f81c > > 8b24762cfd59c6c 100644 > > > > --- a/gcc/doc/invoke.texi > > > > +++ b/gcc/doc/invoke.texi > > > > @@ -14449,9 +14449,11 @@ With the @samp{unlimited} model the > > vectorized code-path is assumed > > > > to be profitable while with the @samp{dynamic} model a runtime check > > > > guards the vectorized code-path to enable it only for iteration > > > > counts that will likely execute faster than when executing the original > > > > -scalar loop. The @samp{cheap} model disables vectorization of > > > > -loops where doing so would be cost prohibitive for example due to > > > > -required runtime checks for data dependence or alignment but otherwise > > > > +scalar loop. The @samp{max} model similarly to the @samp{unlimited} > > > > model > > > > +assumes all vector code is profitable over scalar within loops but > > > > does not > > > > +disable the vector to vector costing. The @samp{cheap} model disables > > > > +vectorization of loops where doing so would be cost prohibitive for > > > > example > > due > > > > +to required runtime checks for data dependence or alignment but > > > > otherwise > > > > is equal to the @samp{dynamic} model. The @samp{very-cheap} model > > disables > > > > vectorization of loops when any runtime check for data dependence or > > alignment > > > > is required, it also disables vectorization of epilogue loops but > > > > otherwise is > > > > diff --git a/gcc/flag-types.h b/gcc/flag-types.h > > > > index > > db573768c23d9f6809ae115e71370960314f16ce..1c941c295a2e608eae58c3e3 > > fb0eba1284f731ca 100644 > > > > --- a/gcc/flag-types.h > > > > +++ b/gcc/flag-types.h > > > > @@ -277,9 +277,10 @@ enum scalar_storage_order_kind { > > > > /* Vectorizer cost-model. Except for DEFAULT, the values are ordered > > > > from > > > > the most conservative to the least conservative. */ > > > > enum vect_cost_model { > > > > - VECT_COST_MODEL_VERY_CHEAP = -3, > > > > - VECT_COST_MODEL_CHEAP = -2, > > > > - VECT_COST_MODEL_DYNAMIC = -1, > > > > + VECT_COST_MODEL_VERY_CHEAP = -4, > > > > + VECT_COST_MODEL_CHEAP = -3, > > > > + VECT_COST_MODEL_DYNAMIC = -2, > > > > + VECT_COST_MODEL_MAX = -1, > > > > VECT_COST_MODEL_UNLIMITED = 0, > > > > VECT_COST_MODEL_DEFAULT = 1 > > > > }; > > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > > > index > > c9395e33fcdfc7deedd979c764daae93b15abace..5c56956c2edcb76210c36b605 > > 26f031011c8e0c7 100644 > > > > --- a/gcc/tree-vect-data-refs.cc > > > > +++ b/gcc/tree-vect-data-refs.cc > > > > @@ -1847,7 +1847,9 @@ vect_peeling_hash_insert > > (hash_table<peel_info_hasher> *peeling_htab, > > > > /* If this DR is not supported with unknown misalignment then bias > > > > this slot when the cost model is disabled. */ > > > > if (!supportable_if_not_aligned > > > > - && unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > > > > + && (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > > > > + || loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > > > > + == VECT_COST_MODEL_MAX)) > > > > slot->count += VECT_MAX_COST; > > > > } > > > > > > > > @@ -2002,7 +2004,8 @@ vect_peeling_hash_choose_best_peeling > > (hash_table<peel_info_hasher> *peeling_hta > > > > res.peel_info.dr_info = NULL; > > > > res.vinfo = loop_vinfo; > > > > > > > > - if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > > > > + if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > > > > + && loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) != > > VECT_COST_MODEL_MAX) > > > > { > > > > res.inside_cost = INT_MAX; > > > > res.outside_cost = INT_MAX; > > > > @@ -2348,7 +2351,8 @@ vect_enhance_data_refs_alignment (loop_vec_info > > loop_vinfo) > > > > We do this automatically for cost model, since we > > > > calculate > > > > cost for every peeling option. */ > > > > poly_uint64 nscalars = npeel_tmp; > > > > - if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > > > > + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > > > > + || loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) == > > VECT_COST_MODEL_MAX) > > > > { > > > > poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > > > unsigned group_size = 1; > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > > index > > 958b829fa8d1ad267fbde3be915719f3a51e6a38..5f3adc257f6581850f901c774 > > 7771f5931df942a 100644 > > > > --- a/gcc/tree-vect-loop.cc > > > > +++ b/gcc/tree-vect-loop.cc > > > > @@ -2407,7 +2407,8 @@ vect_analyze_loop_costing (loop_vec_info > > loop_vinfo, > > > > &min_profitable_estimate, > > > > suggested_unroll_factor); > > > > > > > > - if (min_profitable_iters < 0) > > > > + if (min_profitable_iters < 0 > > > > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX) > > > > { > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > @@ -2430,7 +2431,8 @@ vect_analyze_loop_costing (loop_vec_info > > loop_vinfo, > > > > LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; > > > > > > > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > > > - && LOOP_VINFO_INT_NITERS (loop_vinfo) < th) > > > > + && LOOP_VINFO_INT_NITERS (loop_vinfo) < th > > > > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX) > > > > { > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > @@ -2490,6 +2492,7 @@ vect_analyze_loop_costing (loop_vec_info > > loop_vinfo, > > > > estimated_niter = likely_max_stmt_executions_int (loop); > > > > } > > > > if (estimated_niter != -1 > > > > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX > > > > && ((unsigned HOST_WIDE_INT) estimated_niter > > > > < MAX (th, (unsigned) min_profitable_estimate))) > > > > { > > > > @@ -4638,7 +4641,7 @@ vect_estimate_min_profitable_iters (loop_vec_info > > loop_vinfo, > > > > vector_costs *target_cost_data = loop_vinfo->vector_costs; > > > > > > > > /* Cost model disabled. */ > > > > - if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > > > > + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > > > > { > > > > if (dump_enabled_p ()) > > > > dump_printf_loc (MSG_NOTE, vect_location, "cost model > > > > disabled.\n"); > > > > > > > > > > > > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)