Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

Ramana Radhakrishnan Tue, 06 Aug 2024 21:21:43 -0700


> On 6 Aug 2024, at 4:14 PM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>>> On 5 Aug 2024, at 18:00, Richard Sandiford <richard.sandif...@arm.com> 
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Kyrylo Tkachov <ktkac...@nvidia.com> writes:
>>>>> On 5 Aug 2024, at 12:01, Richard Sandiford <richard.sandif...@arm.com> 
>>>>> wrote:
>>>>> 
>>>>> External email: Use caution opening links or attachments
>>>>> 
>>>>> 
>>>>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>>>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case
>>>>>> 1) the predicate is svptrue and
>>>>>> 2) dividend and divisor are equal.
>>>>>> This is implemented in the gimple_folder for signed and unsigned
>>>>>> integers. Corresponding test cases were added to the existing test
>>>>>> suites.
>>>>>> 
>>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>>>>> regression.
>>>>>> OK for mainline?
>>>>>> 
>>>>>> Please also advise whether it makes sense to implement the same 
>>>>>> optimization
>>>>>> for float types and if so, under which conditions?
>>>>> 
>>>>> I think we should instead use const_binop to try to fold the division
>>>>> whenever the predicate is all-true, or if the function uses _x 
>>>>> predication.
>>>>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.)
>>>>> 
>>>> 
>>>> From what I can see const_binop only works on constant arguments.
>>> 
>>> Yeah, it only produces a result for constant arguments.  I see now
>>> that that isn't the case that the patch is interested in, sorry.
>>> 
>>>> Is fold_binary a better interface to use ? I think it’d hook into the 
>>>> match.pd machinery for divisions at some point.
>>> 
>>> We shouldn't use that from gimple folders AIUI, but perhaps I misremember.
>>> (I realise we'd be using it only to test whether the result is constant,
>>> but even so.)
>>> 
>>> Have you (plural) come across a case where svdiv is used with equal
>>> non-constant arguments?  If it's just being done on first principles
>>> then how about starting with const_binop instead?  If possible, it'd be
>>> good to structure it so that we can reuse the code for svadd, svmul,
>>> svsub, etc.
>> 
>> We’ve had a bit of internal discussion on this to get our ducks in a row.
>> We are interested in having more powerful folding of SVE intrinsics 
>> generally and we’d like some advice on how best to approach this.
>> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE codes 
>> where possible when they are _x-predicated or have a ptrue predicate. 
>> Hopefully that would allow us to get all the match.pd and fold-const.cc 
>> <http://fold-const.cc/> optimizations “for free”.
>> Would that be a reasonable direction rather than adding custom folding code 
>> to individual intrinsics such as svdiv?
>> We’d need to ensure that the midend knows how to expand such GIMPLE codes 
>> with VLA types and that the required folding rules exist in match.pd (though 
>> maybe they work already for VLA types?)
> 
> Expansion shouldn't be a problem, since we already rely on that for
> autovectorisation.
> 
> But I think this comes back to what we discussed earlier, in the context
> of whether we should replace divisions by constants with multi-instruction
> alternatives.  My comment there was:



> 
> 
>  If people want to write out a calculation in natural arithmetic, it
>  would be better to write the algorithm in scalar code and let the
>  vectoriser handle it.  That gives the opportunity for many more
>  optimisations than just this one.
> 



It’s been a while and apologies if I’m coming in a bit late in this and 
possibly that thinking has moved on. I’ve always viewed ACLE as an extension to 
the language and thus fair game for compilers to optimise . For folks who 
really really need that instruction there’s also inline asm :) 


The approach for implementing the ACLE intrinsics for both AArch32 and AArch64 
used to be: 

1. express the intrinsics with GNU C / C++ (see implementations in arm_neon.h) 
if feasible and semantics match up. 
2. fall back to gimple folding / representation if semantics matched up. 
3. RTL unspecs (if no representation feasible , fall back to it )

In the case of SVE VLA intrinsics there is no GNU C feasible, but if there was 
gimple representation possible shouldn’t we go to that ?

With Advanced SIMD the behaviour the user sees the behaviour as per 1 above 
(see the implementation of the basic arithmetic operations for neon in GNUC). 
Is there any reason that SVE needs to be different in its treatment in the 
backend ? 


I could be missing something here...


regards
Ramana

>  Intrinsics are about giving programmers direct, architecture-level
>  control over how something is implemented.  I've seen Arm's library
>  teams go to great lengths to work out which out of a choice of
>  instruction sequences is the best one, even though the sequences in
>  question would look functionally equivalent to a smart-enough compiler.
> 
>  So part of the work of using intrinsics is to figure out what the best
>  sequence is.  And IMO, part of the contract is that the compiler
>  shouldn't interfere with the programmer's choices too much.  If the
>  compiler makes a change, it must very confident that it is a win for
>  the function as a whole.
> 
>  Replacing one division with one shift is fine, as an aid to the programmer.
>  It removes the need for (say) templated functions to check for that case
>  manually.  Constant folding is fine too, for similar reasons.  In these
>  cases, there's not really a cost/benefit choice to be made between
>  different expansions.  One choice is objectively better in all
>  realistic situations.
> 
>  But when it comes to general constants, there are many different choices
>  that could be made when deciding which constants should be open-coded
>  and which shouldn't.  IMO we should leave the choice to the programmer
>  in those cases.  If the compiler gets it wrong, there will be no way
>  for the programmer to force the compiler's hand ("no, when I say svdiv,
>  I really do mean svdiv").
> 
> If we just replace svmul and svdiv with MULT_EXPR and *DIV_EXPR,
> we'd be discarding the user's instruction choices and imposing our own.
> 
> FWIW, Tejas is looking at adding support for C/C++ operators on VLA
> vectors (__ARM_FEATURE_SVE_VECTOR_OPERATORS).  That would then give
> the user the choice of writing the arithmetic "naturally" or using
> intrinsics.  The former is better for users who want the compiler to choose
> the instructions, while the latter is betterfor users who want to control
> the implementation themselves.
> 
> Thanks,
> Richard

smime.p7s
Description: S/MIME cryptographic signature

Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

Reply via email to