http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
Bug #: 57157 Summary: Poor optimization of portable rotate idiom Classification: Unclassified Product: gcc Version: 4.6.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ni...@lysator.liu.se The standard rotate idiom, (x << n) | (x >> (32 - n)) is recognized by gcc (for concreteness, I discuss only the case that x is an uint32_t here). However, this is portable C only for n in the range 0 < n < 32. For n == 0, we get x >> 32 which gives undefined behaviour according to the C standard (6.5.7, Bitwise shift operators). To portably support n == 0, one has to write the rotate as something like (x << n) | (x >> ((-n) & 31)) And this is apparently not recognized by gcc. I compiled this test program with "gcc -O3 -c -save-temps rot.c". Using gcc-4.6.3 on GNU/Linux x86_64 (ubuntu): typedef unsigned int uint32_t; /* Allows 0 < n < 32 (n == 0 gives undefined behaviour) */ uint32_t rot1(unsigned n, uint32_t x) { return (x << n) | (x >> (32 - n)); } /* Allows 0 <= n < 32 */ uint32_t rot2(unsigned n, uint32_t x) { return (x << n) | (x >> ((- n) & 31)); } Generated assembler .file "rot.c" .text .p2align 4,,15 .globl rot1 .type rot1, @function rot1: .LFB0: .cfi_startproc movl %esi, %eax movl %edi, %ecx roll %cl, %eax ret .cfi_endproc .LFE0: .size rot1, .-rot1 .p2align 4,,15 .globl rot2 .type rot2, @function rot2: .LFB1: .cfi_startproc movl %edi, %ecx movl %esi, %eax negl %ecx shrl %cl, %eax movl %edi, %ecx sall %cl, %esi orl %esi, %eax ret .cfi_endproc .LFE1: .size rot2, .-rot2 .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits The desired result is of course to get a rotl instruction also for rot2, instead of the combination of negl, shrl, sall, and orl. Applying the above portability fix to my ROTL32 macro in GNU Nettle results in a slowdown of almost 20% for cast128. This function depends a lot on key-dependant rotates, where rotation counts of zero will happen for some keys.