https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86723
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- It boils down to even simpler: int bar (unsigned long long value) { return ((value & 0x000000ff00000000ull) >> 8) | ((value & 0x0000ff0000000000ull) >> 24) | ((value & 0x00ff000000000000ull) >> 40) | ((value & 0xff00000000000000ull) >> 56); } which is what you get from #c2 after optimizations. The bswap pass tries to ATM recognize just nops - 0x0807060504030201 permutation markers - and full bswaps - 0x0102030405060708 permutations, where in those permutation bytes 0 - target byte has the value 0 FF - target byte has an unknown value (eg. due to sign extension) 1..size - marker value is the byte index in the source (0 for lsb). But we could very well handle also masked bswaps, either just those one can get from zero extensions, so 0x0000000005060708 or 0x0000000000000708 etc., or generally with clearing of arbitrary bytes in it, say 0x0100030400060700 by doing __builtin_bswap64 (arg) & 0xff00ffff00ffff00ULL etc. Then this optimization would fall out from that, because we'd do (int) (__builtin_bswap64 (arg) & 0xffffffffULL) and further opts would optimize away the masking.