https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99434
Bug ID: 99434 Summary: std::bit_cast generates more instructions than __builtin_bit_cast and memcpy with -march=native Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- https://godbolt.org/z/5KWM8Y struct u64x2_t { #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ std::uint64_t high,low; #else std::uint64_t low,high; #endif }; u64x2_t umul5(std::uint64_t a,std::uint64_t b) noexcept { return std::bit_cast<u64x2_t>(static_cast<__uint128_t>(a)*b); } u64x2_t umul_builtin(std::uint64_t a,std::uint64_t b) noexcept { return __builtin_bit_cast(u64x2_t,static_cast<__uint128_t>(a)*b); } assembly: umul5(unsigned long, unsigned long): movq %rdi, %rdx mulx %rsi, %rdx, %rcx movq %rdx, %rax movq %rcx, %rdx ret umul_builtin(unsigned long, unsigned long): movq %rdi, %rdx mulx %rsi, %rax, %rdx ret There is another issue: std::uint64_t umul128(std::uint64_t a,std::uint64_t b,std::uint64_t& high) noexcept { __uint128_t res{static_cast<__uint128_t>(a)*b}; high=static_cast<std::uint64_t>(res>>64); return static_cast<std::uint64_t>(res); } I cannot do this since this generates more instructions than using memcpy to pun types. clang does not have this issue and all cases are dealt with correctly.