https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804
--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> --- So, things like uint64_t swap64(uint64_t x) { uint64_t a = __builtin_bswap32(x); x >>= 32; a <<= 32; return __builtin_bswap32(x) | a; } Having similar problems with useless movs is from the same non well-optimized register allocation on function boundaries ?