https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99434

            Bug ID: 99434
           Summary: std::bit_cast generates more instructions than
                    __builtin_bit_cast and memcpy with -march=native
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: unlvsur at live dot com
  Target Milestone: ---

https://godbolt.org/z/5KWM8Y
struct u64x2_t
{
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    std::uint64_t high,low;
#else
    std::uint64_t low,high;
#endif
};
u64x2_t umul5(std::uint64_t a,std::uint64_t b) noexcept
{
    return std::bit_cast<u64x2_t>(static_cast<__uint128_t>(a)*b);
}

u64x2_t umul_builtin(std::uint64_t a,std::uint64_t b) noexcept
{
    return __builtin_bit_cast(u64x2_t,static_cast<__uint128_t>(a)*b);
}

assembly:
umul5(unsigned long, unsigned long):
        movq    %rdi, %rdx
        mulx    %rsi, %rdx, %rcx
        movq    %rdx, %rax
        movq    %rcx, %rdx
        ret
umul_builtin(unsigned long, unsigned long):
        movq    %rdi, %rdx
        mulx    %rsi, %rax, %rdx
        ret

There is another issue:

std::uint64_t umul128(std::uint64_t a,std::uint64_t b,std::uint64_t& high)
noexcept
{
    __uint128_t res{static_cast<__uint128_t>(a)*b};
    high=static_cast<std::uint64_t>(res>>64);
    return static_cast<std::uint64_t>(res);
}
I cannot do this since this generates more instructions than using memcpy to
pun types.

clang does not have this issue and all cases are dealt with correctly.

Reply via email to