https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88097
Daniel Fruzynski <bugzi...@poradnik-webmastera.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID |--- --- Comment #4 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> --- Looks that there is one more issue here, ntohs is implemented with inline assembly instead of __builtin_bswap16. When I tried to use this buildin gcc started using movbe instruction when compiling with -O3 -mmovbe. uint32_t test2(Test* ip) { return ((__builtin_bswap16(ip->Word1) << 16) | __builtin_bswap16(ip->Word2)); } test2(Test*): movbe ax, WORD PTR [rdi] movbe dx, WORD PTR [rdi+2] sal eax, 16 movzx edx, dx or eax, edx ret When I was logging this issue yesterday, bugzilla showed Bug 54733 as a possible duplicate. Looks that gcc already has some similar kind of optimization implemented. I suspect that after fixing system headers to use __builtin_bswap* instead of inline assembly it would be possible to improve this optimization further. I reopen this issue.