https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
--- Comment #12 from Arnd Bergmann <arnd at linaro dot org> --- Created attachment 37991 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37991&action=edit simpler test case without manual byte swap For reference, I have sent a patch to the kernel to replace the open-coded byteswap with a __builtin_bswap64. This produces much better object code with both gcc-5 and gcc-6 than the original version and uses the native swap instructions, so it's certainly a good thing to do regardless of the resolution in gcc-6: http://thread.gmane.org/gmane.linux.kernel/2178301 For reference, these are the sizes of stack usage and function length using the actual source code from the kernel: stack length gcc-5.3.1, linux-4.5: 156 1146 gcc-5.3.1, patched: 28 804 gcc-6.0.0, linux-4.5: 1192 5144 gcc-6.0.0, patched: 76 1612 I have adapted the test case now to no longer use unaligned data, memcpy or manual byte swaps. The object code generated by gcc-6 nowhere as bad as with the original example, but still considerably worse than what I get with gcc-5.