https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266
--- Comment #13 from H. Peter Anvin <hpa at zytor dot com> --- On October 22, 2024 5:49:41 PM PDT, "pinskia at gcc dot gnu.org" <gcc-bugzi...@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266 > >--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> --- >(In reply to H. Peter Anvin from comment #6) >> And THAT is exactly the point: *the two aren't equivalent.* Only the >> programmer knows when this instruction is usable, and for performance >> reasons, you *really, really* want to be able to use it when you as the >> programmer know, a priori, that you can. > >Actually the compiler could know based on the ranges. And it could techincally >optimize something like you gave for div2 into the instruction. > >Like say: >``` >typedef unsigned _BitInt(64) uint64_t; >uint64_t div2(uint64_t hi, uint64_t lo, uint64_t divisor) >{ > unsigned _BitInt(128) dividend = ((unsigned _BitInt(128))hi << 64) | lo; > unsigned _BitInt(128) qq = dividend / divisor; > if (qq >> 64) > __builtin_unreachable(); > return qq; >} >``` >Could be optimized to using the 128/64->64 instruction since you say the upper >bits are 0; otherwise it is undefined. > >Note a trap here could be how it is undefined. > If the compiler actually can figure it out, that's great, but you would want the trapping behavior of a proper divide overflow. As I showed, it currently doesn't do anything like that.