On Mon, Mar 31, 2025 at 11:34 PM Robin Dapp <rdapp....@gmail.com> wrote: > > >> Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if > >> AVL < (2 * VLMAX)" rule... > > > > +1, spec has some description about this but I am not sure if I really get > > the point. > > > > From Spec: > > > > "For example, this permits an implementation to set vl = ceil(AVL > > / 2) for VLMAX < AVL < 2*VLMAX in order to evenly > > distribute work over the last two iterations of a stripmine loop. > > Requirement > > 2 ensures that the rst stripmine iteration of reduction > > loops uses the largest vector length of all iterations, even in the case of > > AVL < 2*VLMAX. This allows software to avoid needing to > > explicitly calculate a running maximum of vector lengths observed > > during a stripmined loop. Requirement 2 also allows an > > implementation to set vl to VLMAX for VLMAX < AVL < 2*VLMAX" > > Yeah, that's very unfortunate. > > The rule is something like > > if AVL >= 2 * VLMAX > vl = vsetvl = min (AVL, VLMAX) > > if VLMAX > AVL < 2 * VLMAX > vl = vsetvl = "whatever" ;)
Note it's not quite "whatever" -- there is a constraint that vl be monotonically nonincreasing, which in some cases is the only important property. No denying this is an annoyance, though. > > if AVL <= VLMAX > vl = vsetvl = min (AVL, VLMAX) > > The idea of load balancing is alright I guess but it really complicates > matters > in the compiler. > > FWIW my plan for GCC 16 is to define a SELECT_VL_SANE (or any better name I > can > come up with) that doesn't have this behavior and always only performs a > minimum instead. This will allow us to perform scalar evolution on vsetvl > rather than giving up as we do right now. Microarchitectures where vsetvl > always behaves like a minimum would then enable the corresponding > expander/insn > and others would fall back to the current behavior. > > -- > Regards > Robin >