On 17 Jul 2024, at 09:29, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschm...@nvidia.com> writes: >> This patch series is part of an ongoing effort to replace the SVE intrinsic >> svdiv >> by lower-strength instructions for division by constant. To that end, we >> implemented svdiv_impl::fold to perform the following transformation in >> gimple: >> - Division where all divisors are the same power of 2 --> svasrd > > Sounds good. > >> - Division where all divisors are powers of 2 --> svasr > > I don't think this is correct for negative dividends (which is why > ASRD exists). E.g. -1 / 4 is 0 as computed by svdiv (round towards zero), > but -1 as computed by svasr (round towards -Inf). You’re right, I dropped the second patch. > >> We chose svdiv_impl::fold as location for the implementation to have the >> transform applied as early as possible, such that other (existing or future) >> gimple optimizations can be applied on the result. >> Currently, the transform to is only applied for signed integers, because >> there do not exist an unsigned svasrd and svasr. The transform has not (yet) >> been implemented for svdivr. > > FWIW, using svlsr for unsigned divisions should be OK. Thanks for pointing that out, I adapted the patch to transform unsigned division by a power of 2 to svlsr. > >> Please also comment/advise on the following: >> In a next patch, we would like to replace SVE division by constants (other >> than powers of 2) by multiply and shifts, similar as for scalar division. >> This is planned to be implemented in the gimple_folder as well. Thoughts? > > I'm a bit uneasy about going that far. I suppose it comes down to a > question about what intrinsics are for. Are they for describing an > algorithm, or for hand-optimising a specific implementation of the > algorithm? IMO it's the latter. > > If people want to write out a calculation in natural arithmetic, it > would be better to write the algorithm in scalar code and let the > vectoriser handle it. That gives the opportunity for many more > optimisations than just this one. > > Intrinsics are about giving programmers direct, architecture-level > control over how something is implemented. I've seen Arm's library > teams go to great lengths to work out which out of a choice of > instruction sequences is the best one, even though the sequences in > question would look functionally equivalent to a smart-enough compiler. > > So part of the work of using intrinsics is to figure out what the best > sequence is. And IMO, part of the contract is that the compiler > shouldn't interfere with the programmer's choices too much. If the > compiler makes a change, it must very confident that it is a win for > the function as a whole. > > Replacing one division with one shift is fine, as an aid to the programmer. > It removes the need for (say) templated functions to check for that case > manually. Constant folding is fine too, for similar reasons. In these > cases, there's not really a cost/benefit choice to be made between > different expansions. One choice is objectively better in all > realistic situations. > > But when it comes to general constants, there are many different choices > that could be made when deciding which constants should be open-coded > and which shouldn't. IMO we should leave the choice to the programmer > in those cases. If the compiler gets it wrong, there will be no way > for the programmer to force the compiler's hand ("no, when I say svdiv, > I really do mean svdiv”). Makes sense, then we will not pursue this further and only leave the current optimization. Best, Jennifer > > Thanks, > Richard
smime.p7s
Description: S/MIME cryptographic signature