> On 6 Aug 2024, at 4:14 PM, Richard Sandiford <richard.sandif...@arm.com> > wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: >>> On 5 Aug 2024, at 18:00, Richard Sandiford <richard.sandif...@arm.com> >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> Kyrylo Tkachov <ktkac...@nvidia.com> writes: >>>>> On 5 Aug 2024, at 12:01, Richard Sandiford <richard.sandif...@arm.com> >>>>> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>>>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case >>>>>> 1) the predicate is svptrue and >>>>>> 2) dividend and divisor are equal. >>>>>> This is implemented in the gimple_folder for signed and unsigned >>>>>> integers. Corresponding test cases were added to the existing test >>>>>> suites. >>>>>> >>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>>>>> regression. >>>>>> OK for mainline? >>>>>> >>>>>> Please also advise whether it makes sense to implement the same >>>>>> optimization >>>>>> for float types and if so, under which conditions? >>>>> >>>>> I think we should instead use const_binop to try to fold the division >>>>> whenever the predicate is all-true, or if the function uses _x >>>>> predication. >>>>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.) >>>>> >>>> >>>> From what I can see const_binop only works on constant arguments. >>> >>> Yeah, it only produces a result for constant arguments. I see now >>> that that isn't the case that the patch is interested in, sorry. >>> >>>> Is fold_binary a better interface to use ? I think it’d hook into the >>>> match.pd machinery for divisions at some point. >>> >>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember. >>> (I realise we'd be using it only to test whether the result is constant, >>> but even so.) >>> >>> Have you (plural) come across a case where svdiv is used with equal >>> non-constant arguments? If it's just being done on first principles >>> then how about starting with const_binop instead? If possible, it'd be >>> good to structure it so that we can reuse the code for svadd, svmul, >>> svsub, etc. >> >> We’ve had a bit of internal discussion on this to get our ducks in a row. >> We are interested in having more powerful folding of SVE intrinsics >> generally and we’d like some advice on how best to approach this. >> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes >> where possible when they are _x-predicated or have a ptrue predicate. >> Hopefully that would allow us to get all the match.pd and fold-const.cc >> <http://fold-const.cc/> optimizations “for free”. >> Would that be a reasonable direction rather than adding custom folding code >> to individual intrinsics such as svdiv? >> We’d need to ensure that the midend knows how to expand such GIMPLE codes >> with VLA types and that the required folding rules exist in match.pd (though >> maybe they work already for VLA types?) > > Expansion shouldn't be a problem, since we already rely on that for > autovectorisation. > > But I think this comes back to what we discussed earlier, in the context > of whether we should replace divisions by constants with multi-instruction > alternatives. My comment there was:
> > > If people want to write out a calculation in natural arithmetic, it > would be better to write the algorithm in scalar code and let the > vectoriser handle it. That gives the opportunity for many more > optimisations than just this one. > It’s been a while and apologies if I’m coming in a bit late in this and possibly that thinking has moved on. I’ve always viewed ACLE as an extension to the language and thus fair game for compilers to optimise . For folks who really really need that instruction there’s also inline asm :) The approach for implementing the ACLE intrinsics for both AArch32 and AArch64 used to be: 1. express the intrinsics with GNU C / C++ (see implementations in arm_neon.h) if feasible and semantics match up. 2. fall back to gimple folding / representation if semantics matched up. 3. RTL unspecs (if no representation feasible , fall back to it ) In the case of SVE VLA intrinsics there is no GNU C feasible, but if there was gimple representation possible shouldn’t we go to that ? With Advanced SIMD the behaviour the user sees the behaviour as per 1 above (see the implementation of the basic arithmetic operations for neon in GNUC). Is there any reason that SVE needs to be different in its treatment in the backend ? I could be missing something here... regards Ramana > Intrinsics are about giving programmers direct, architecture-level > control over how something is implemented. I've seen Arm's library > teams go to great lengths to work out which out of a choice of > instruction sequences is the best one, even though the sequences in > question would look functionally equivalent to a smart-enough compiler. > > So part of the work of using intrinsics is to figure out what the best > sequence is. And IMO, part of the contract is that the compiler > shouldn't interfere with the programmer's choices too much. If the > compiler makes a change, it must very confident that it is a win for > the function as a whole. > > Replacing one division with one shift is fine, as an aid to the programmer. > It removes the need for (say) templated functions to check for that case > manually. Constant folding is fine too, for similar reasons. In these > cases, there's not really a cost/benefit choice to be made between > different expansions. One choice is objectively better in all > realistic situations. > > But when it comes to general constants, there are many different choices > that could be made when deciding which constants should be open-coded > and which shouldn't. IMO we should leave the choice to the programmer > in those cases. If the compiler gets it wrong, there will be no way > for the programmer to force the compiler's hand ("no, when I say svdiv, > I really do mean svdiv"). > > If we just replace svmul and svdiv with MULT_EXPR and *DIV_EXPR, > we'd be discarding the user's instruction choices and imposing our own. > > FWIW, Tejas is looking at adding support for C/C++ operators on VLA > vectors (__ARM_FEATURE_SVE_VECTOR_OPERATORS). That would then give > the user the choice of writing the arithmetic "naturally" or using > intrinsics. The former is better for users who want the compiler to choose > the instructions, while the latter is betterfor users who want to control > the implementation themselves. > > Thanks, > Richard
smime.p7s
Description: S/MIME cryptographic signature