> On 6 Aug 2024, at 12:44, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: >>> On 5 Aug 2024, at 18:00, Richard Sandiford <richard.sandif...@arm.com> >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Kyrylo Tkachov <ktkac...@nvidia.com> writes: >>>>> On 5 Aug 2024, at 12:01, Richard Sandiford <richard.sandif...@arm.com> >>>>> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>>>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case >>>>>> 1) the predicate is svptrue and >>>>>> 2) dividend and divisor are equal. >>>>>> This is implemented in the gimple_folder for signed and unsigned >>>>>> integers. Corresponding test cases were added to the existing test >>>>>> suites. >>>>>> >>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>>>>> regression. >>>>>> OK for mainline? >>>>>> >>>>>> Please also advise whether it makes sense to implement the same >>>>>> optimization >>>>>> for float types and if so, under which conditions? >>>>> >>>>> I think we should instead use const_binop to try to fold the division >>>>> whenever the predicate is all-true, or if the function uses _x >>>>> predication. >>>>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.) >>>>> >>>> >>>> From what I can see const_binop only works on constant arguments. >>> >>> Yeah, it only produces a result for constant arguments. I see now >>> that that isn't the case that the patch is interested in, sorry. >>> >>>> Is fold_binary a better interface to use ? I think it’d hook into the >>>> match.pd machinery for divisions at some point. >>> >>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember. >>> (I realise we'd be using it only to test whether the result is constant, >>> but even so.) >>> >>> Have you (plural) come across a case where svdiv is used with equal >>> non-constant arguments? If it's just being done on first principles >>> then how about starting with const_binop instead? If possible, it'd be >>> good to structure it so that we can reuse the code for svadd, svmul, >>> svsub, etc. >> >> We’ve had a bit of internal discussion on this to get our ducks in a row. >> We are interested in having more powerful folding of SVE intrinsics >> generally and we’d like some advice on how best to approach this. >> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes >> where possible when they are _x-predicated or have a ptrue predicate. >> Hopefully that would allow us to get all the match.pd and fold-const.cc >> <http://fold-const.cc/> optimizations “for free”. >> Would that be a reasonable direction rather than adding custom folding code >> to individual intrinsics such as svdiv? >> We’d need to ensure that the midend knows how to expand such GIMPLE codes >> with VLA types and that the required folding rules exist in match.pd (though >> maybe they work already for VLA types?) > > Expansion shouldn't be a problem, since we already rely on that for > autovectorisation. > > But I think this comes back to what we discussed earlier, in the context > of whether we should replace divisions by constants with multi-instruction > alternatives. My comment there was: > > I'm a bit uneasy about going that far. I suppose it comes down to a > question about what intrinsics are for. Are they for describing an > algorithm, or for hand-optimising a specific implementation of the > algorithm? IMO it's the latter. > > If people want to write out a calculation in natural arithmetic, it > would be better to write the algorithm in scalar code and let the > vectoriser handle it. That gives the opportunity for many more > optimisations than just this one. > > Intrinsics are about giving programmers direct, architecture-level > control over how something is implemented. I've seen Arm's library > teams go to great lengths to work out which out of a choice of > instruction sequences is the best one, even though the sequences in > question would look functionally equivalent to a smart-enough compiler. > > So part of the work of using intrinsics is to figure out what the best > sequence is. And IMO, part of the contract is that the compiler > shouldn't interfere with the programmer's choices too much. If the > compiler makes a change, it must very confident that it is a win for > the function as a whole. > > Replacing one division with one shift is fine, as an aid to the programmer. > It removes the need for (say) templated functions to check for that case > manually. Constant folding is fine too, for similar reasons. In these > cases, there's not really a cost/benefit choice to be made between > different expansions. One choice is objectively better in all > realistic situations. > > But when it comes to general constants, there are many different choices > that could be made when deciding which constants should be open-coded > and which shouldn't. IMO we should leave the choice to the programmer > in those cases. If the compiler gets it wrong, there will be no way > for the programmer to force the compiler's hand ("no, when I say svdiv, > I really do mean svdiv"). > > If we just replace svmul and svdiv with MULT_EXPR and *DIV_EXPR, > we'd be discarding the user's instruction choices and imposing our own. > > FWIW, Tejas is looking at adding support for C/C++ operators on VLA > vectors (__ARM_FEATURE_SVE_VECTOR_OPERATORS). That would then give > the user the choice of writing the arithmetic "naturally" or using > intrinsics. The former is better for users who want the compiler to choose > the instructions, while the latter is betterfor users who want to control > the implementation themselves. >
Thanks for the detailed explanation (again). So it seems that for intrinsics we’re comfortable going as far as to fold carefully-chosen individual combinations like when all operands are known constants or degenerate cases like svdiv (x, x) but not treating them as generic GIMPLE operations in all cases? Kyrill > Thanks, > Richard