On 17 Jul 2024, at 09:29, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> This patch series is part of an ongoing effort to replace the SVE intrinsic 
>> svdiv
>> by lower-strength instructions for division by constant. To that end, we
>> implemented svdiv_impl::fold to perform the following transformation in 
>> gimple:
>> - Division where all divisors are the same power of 2 --> svasrd
> 
> Sounds good.
> 
>> - Division where all divisors are powers of 2 --> svasr
> 
> I don't think this is correct for negative dividends (which is why
> ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
> but -1 as computed by svasr (round towards -Inf).
You’re right, I dropped the second patch.
> 
>> We chose svdiv_impl::fold as location for the implementation to have the
>> transform applied as early as possible, such that other (existing or future)
>> gimple optimizations can be applied on the result.
>> Currently, the transform to is only applied for signed integers, because
>> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
>> been implemented for svdivr.
> 
> FWIW, using svlsr for unsigned divisions should be OK.
Thanks for pointing that out, I adapted the patch to transform unsigned 
division by a power of 2 to svlsr.
> 
>> Please also comment/advise on the following:
>> In a next patch, we would like to replace SVE division by constants (other
>> than powers of 2) by multiply and shifts, similar as for scalar division.
>> This is planned to be implemented in the gimple_folder as well. Thoughts?
> 
> I'm a bit uneasy about going that far.  I suppose it comes down to a
> question about what intrinsics are for.  Are they for describing an
> algorithm, or for hand-optimising a specific implementation of the
> algorithm?  IMO it's the latter.
> 
> If people want to write out a calculation in natural arithmetic, it
> would be better to write the algorithm in scalar code and let the
> vectoriser handle it.  That gives the opportunity for many more
> optimisations than just this one.
> 
> Intrinsics are about giving programmers direct, architecture-level
> control over how something is implemented.  I've seen Arm's library
> teams go to great lengths to work out which out of a choice of
> instruction sequences is the best one, even though the sequences in
> question would look functionally equivalent to a smart-enough compiler.
> 
> So part of the work of using intrinsics is to figure out what the best
> sequence is.  And IMO, part of the contract is that the compiler
> shouldn't interfere with the programmer's choices too much.  If the
> compiler makes a change, it must very confident that it is a win for
> the function as a whole.
> 
> Replacing one division with one shift is fine, as an aid to the programmer.
> It removes the need for (say) templated functions to check for that case
> manually.  Constant folding is fine too, for similar reasons.  In these
> cases, there's not really a cost/benefit choice to be made between
> different expansions.  One choice is objectively better in all
> realistic situations.
> 
> But when it comes to general constants, there are many different choices
> that could be made when deciding which constants should be open-coded
> and which shouldn't.  IMO we should leave the choice to the programmer
> in those cases.  If the compiler gets it wrong, there will be no way
> for the programmer to force the compiler's hand ("no, when I say svdiv,
> I really do mean svdiv”).
Makes sense, then we will not pursue this further and only leave the current 
optimization.
Best, Jennifer
> 
> Thanks,
> Richard

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to