Jennifer Schmitz <jschm...@nvidia.com> writes:
> This patch series is part of an ongoing effort to replace the SVE intrinsic 
> svdiv
> by lower-strength instructions for division by constant. To that end, we
> implemented svdiv_impl::fold to perform the following transformation in 
> gimple:
> - Division where all divisors are the same power of 2 --> svasrd

Sounds good.

> - Division where all divisors are powers of 2 --> svasr

I don't think this is correct for negative dividends (which is why
ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
but -1 as computed by svasr (round towards -Inf).

> We chose svdiv_impl::fold as location for the implementation to have the
> transform applied as early as possible, such that other (existing or future)
> gimple optimizations can be applied on the result.
> Currently, the transform to is only applied for signed integers, because
> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
> been implemented for svdivr.

FWIW, using svlsr for unsigned divisions should be OK.

> Please also comment/advise on the following:
> In a next patch, we would like to replace SVE division by constants (other
> than powers of 2) by multiply and shifts, similar as for scalar division.
> This is planned to be implemented in the gimple_folder as well. Thoughts?

I'm a bit uneasy about going that far.  I suppose it comes down to a
question about what intrinsics are for.  Are they for describing an
algorithm, or for hand-optimising a specific implementation of the
algorithm?  IMO it's the latter.

If people want to write out a calculation in natural arithmetic, it
would be better to write the algorithm in scalar code and let the
vectoriser handle it.  That gives the opportunity for many more
optimisations than just this one.

Intrinsics are about giving programmers direct, architecture-level
control over how something is implemented.  I've seen Arm's library
teams go to great lengths to work out which out of a choice of
instruction sequences is the best one, even though the sequences in
question would look functionally equivalent to a smart-enough compiler.

So part of the work of using intrinsics is to figure out what the best
sequence is.  And IMO, part of the contract is that the compiler
shouldn't interfere with the programmer's choices too much.  If the
compiler makes a change, it must very confident that it is a win for
the function as a whole.

Replacing one division with one shift is fine, as an aid to the programmer.
It removes the need for (say) templated functions to check for that case
manually.  Constant folding is fine too, for similar reasons.  In these
cases, there's not really a cost/benefit choice to be made between
different expansions.  One choice is objectively better in all
realistic situations.

But when it comes to general constants, there are many different choices
that could be made when deciding which constants should be open-coded
and which shouldn't.  IMO we should leave the choice to the programmer
in those cases.  If the compiler gets it wrong, there will be no way
for the programmer to force the compiler's hand ("no, when I say svdiv,
I really do mean svdiv").

Thanks,
Richard

Reply via email to