[Bug c/117266] RFE: builtins for N*N -> 2N multiplication and 2N/N -> N div/mod

hpa at zytor dot com via Gcc-bugs Tue, 22 Oct 2024 17:56:54 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266


--- Comment #13 from H. Peter Anvin <hpa at zytor dot com> ---
On October 22, 2024 5:49:41 PM PDT, "pinskia at gcc dot gnu.org"
<gcc-bugzi...@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117266
>
>--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>(In reply to H. Peter Anvin from comment #6)
>> And THAT is exactly the point: *the two aren't equivalent.* Only the 
>> programmer knows when this instruction is usable, and for performance 
>> reasons, you *really, really* want to be able to use it when you as the 
>> programmer know, a priori, that you can.
>
>Actually the compiler could know based on the ranges. And it could techincally
>optimize something like you gave for div2 into the instruction.
>
>Like say:
>```
>typedef unsigned _BitInt(64)  uint64_t;
>uint64_t div2(uint64_t hi, uint64_t lo, uint64_t divisor)
>{
>     unsigned _BitInt(128) dividend = ((unsigned _BitInt(128))hi << 64) | lo;
>     unsigned _BitInt(128)  qq = dividend / divisor;
>     if (qq >> 64)
>        __builtin_unreachable();
>     return qq;
>}
>```
>Could be optimized to using the 128/64->64 instruction since you say the upper
>bits are 0; otherwise it is undefined.
>
>Note a trap here could be how it is undefined.
>

If the compiler actually can figure it out, that's great, but you would want
the trapping behavior of a proper divide overflow.

As I showed, it currently doesn't do anything like that.

[Bug c/117266] RFE: builtins for N*N -> 2N multiplication and 2N/N -> N div/mod

Reply via email to