On Wed, 3 Sep 2025, Richard Sandiford wrote: > Tamar Christina <[email protected]> writes: > >> -----Original Message----- > >> From: Richard Biener <[email protected]> > >> Sent: Tuesday, September 2, 2025 1:44 PM > >> To: Tamar Christina <[email protected]> > >> Cc: [email protected]; nd <[email protected]> > >> Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the > >> cost model has > >> overriden it > >> > >> On Tue, 2 Sep 2025, Tamar Christina wrote: > >> > >> > > What was it that made you propose this change? > >> > > >> > When we have a loop of say int and a pragma unroll 4 > >> > > >> > If the vectorizer picks V4SI as the mode, the requested unroll ended up > >> > exactly matching the VF. As such the requested unroll is 1 and we don't > >> > clear the pragma. > >> > > >> > So it did honor the requested unroll factor. However since we didn't set > >> > the unroll amount back and left it at 4 the rtl unroller won't use the > >> > rtl cost model at all and just unroll the vector loop 4 times. > >> > >> Ah, OK. > >> > >> > This change isn't to bypass the rtl cost model, it's to allow it to be > >> > used rather than overriding it after vectorization. > >> > >> OK, fine. But still, consider > >> > >> #pragma unroll 4 > >> for (int i = 0; i < 64; ++i) > >> { > >> a[4*i+0] = i; > >> a[4*i+1] = i; > >> a[4*i+2] = i; > >> a[4*i+3] = i; > >> } > >> > >> so VF == 1, suggested_unroll_factor == 4. If we don't up VF to 4 > >> should we still claim we did any unrolling? If the target suggested > >> a unroll factor of two, should we instead change ->unroll to 2? > >> Should the user unroll factor override the vector target one? > >> > > > > I think the target unroll factor should always win out, primarily because > > of throughput based costing. The loop above on a 4 VX system should > > by the vectorizer already be using VF = 4, suggested_unroll_factor == 4. > > > > We also don't ever force unrolling for predicated SVE because for > > predicated SVE we have to balance predicate throughput limitations > > of any given CPU. Having the user unroll factor be able to override > > the cost model one will almost certainly lead to worse performance > > in this case. > > FWIW, cause and effect are kind-of the other way around: we request an > unroll factor for SVE in the normal way, but doing so disables predication, > thanks to: > > /* For partial-vector-usage=1, try to push the handling of partial > vectors to the epilogue, with the main loop continuing to operate > on full vectors. > > If we are unrolling we also do not want to use partial vectors. This > is to avoid the overhead of generating multiple masks and also to > avoid having to execute entire iterations of FALSE masked instructions > when dealing with one or less full iterations. > > ??? We could then end up failing to use partial vectors if we > decide to peel iterations into a prologue, and if the main loop > then ends up processing fewer than VF iterations. */ > if ((param_vect_partial_vector_usage == 1 > || loop_vinfo->suggested_unroll_factor > 1) > && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) > && !vect_known_niters_smaller_than_vf (loop_vinfo)) > LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; > else > LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
Context doesn't show, but I guess this honors LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P. Also I wonder when we don't use partial vectors, does that mean we are using fixed-length vectors? Unless -msve-vector-bits is specified this means using NEON width? At least I don't remember seeing non-len-based code to query the actual vector length at runtime? > In other words, the choice of unroll factor is an input to the > predication decision, rather than the predication decision being an > input to the choice of unroll factor. In principle this makes sense. Richard.
