Jennifer Schmitz <jschm...@nvidia.com> writes: > This patch series is part of an ongoing effort to replace the SVE intrinsic > svdiv > by lower-strength instructions for division by constant. To that end, we > implemented svdiv_impl::fold to perform the following transformation in > gimple: > - Division where all divisors are the same power of 2 --> svasrd
Sounds good. > - Division where all divisors are powers of 2 --> svasr I don't think this is correct for negative dividends (which is why ASRD exists). E.g. -1 / 4 is 0 as computed by svdiv (round towards zero), but -1 as computed by svasr (round towards -Inf). > We chose svdiv_impl::fold as location for the implementation to have the > transform applied as early as possible, such that other (existing or future) > gimple optimizations can be applied on the result. > Currently, the transform to is only applied for signed integers, because > there do not exist an unsigned svasrd and svasr. The transform has not (yet) > been implemented for svdivr. FWIW, using svlsr for unsigned divisions should be OK. > Please also comment/advise on the following: > In a next patch, we would like to replace SVE division by constants (other > than powers of 2) by multiply and shifts, similar as for scalar division. > This is planned to be implemented in the gimple_folder as well. Thoughts? I'm a bit uneasy about going that far. I suppose it comes down to a question about what intrinsics are for. Are they for describing an algorithm, or for hand-optimising a specific implementation of the algorithm? IMO it's the latter. If people want to write out a calculation in natural arithmetic, it would be better to write the algorithm in scalar code and let the vectoriser handle it. That gives the opportunity for many more optimisations than just this one. Intrinsics are about giving programmers direct, architecture-level control over how something is implemented. I've seen Arm's library teams go to great lengths to work out which out of a choice of instruction sequences is the best one, even though the sequences in question would look functionally equivalent to a smart-enough compiler. So part of the work of using intrinsics is to figure out what the best sequence is. And IMO, part of the contract is that the compiler shouldn't interfere with the programmer's choices too much. If the compiler makes a change, it must very confident that it is a win for the function as a whole. Replacing one division with one shift is fine, as an aid to the programmer. It removes the need for (say) templated functions to check for that case manually. Constant folding is fine too, for similar reasons. In these cases, there's not really a cost/benefit choice to be made between different expansions. One choice is objectively better in all realistic situations. But when it comes to general constants, there are many different choices that could be made when deciding which constants should be open-coded and which shouldn't. IMO we should leave the choice to the programmer in those cases. If the compiler gets it wrong, there will be no way for the programmer to force the compiler's hand ("no, when I say svdiv, I really do mean svdiv"). Thanks, Richard