https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98908
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |RESOLVED Target Milestone|--- |9.0 Resolution|--- |FIXED --- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Gabriel Ravier from comment #6) > Also the second example wasn't misoptimized, on the contrary it was the most > reasonable portable function I could write that would work equivalently to > the first *and* that GCC would optimize ideally. GCC 7.1.0 produces: f(reg): .LFB0: .cfi_startproc mov edx, edi xor eax, eax mov ecx, edi and edx, -2 mov al, dl movzx edx, ch and edx, -128 mov ah, dl ret f1(reg): .LFB1: .cfi_startproc and di, -32514 xor eax, eax movzx edx, di mov al, dil sar edx, 8 mov ah, dl ret f is your first example and f1 is the second. As you can see GCC before GCC 8 did neither. In GCC 8, the second function produces: _1 = x.l; _2 = (signed short) _1; _3 = x.h; _4 = (int) _3; _5 = _4 << 8; _6 = (signed short) _5; _7 = _2 | _6; _8 = (short unsigned int) _7; tmp_14 = _8 & 33022; MEM[(unsigned char *)&D.2500] = tmp_14; And is only opimitized in GCC 9 with bswap producing what I mentioned before So fixed for GCC 9.