http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15184
--- Comment #22 from Mikael Pettersson <mikpe at it dot uu.se> 2012-11-10 13:36:46 UTC --- Created attachment 28655 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28655 another test case I'm using a construct similar to the 'f1' function of the initial test case to set the low 8 or 16 bits of a 32-bit "register" in a CPU emulator of mine, and the code generated by gcc 4.6/4.7/4.8 for this on x86_64 is appalling. In the attached test case, the setb1 and setw1 functions use bit and/or operations, while the setb2 and setw2 functions assign the sub-field directly via a union. gcc compiles each set*2 function to a single mov (+ ret), while each set*1 function becomes 5 instructions (+ ret).