http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157



             Bug #: 57157

           Summary: Poor optimization of portable rotate idiom

    Classification: Unclassified

           Product: gcc

           Version: 4.6.3

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: c

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: ni...@lysator.liu.se





The standard rotate idiom,



  (x << n) | (x >> (32 - n))



is recognized by gcc (for concreteness, I discuss only the case that x

is an uint32_t here).



However, this is portable C only for n in the range 0 < n < 32. For n

== 0, we get x >> 32 which gives undefined behaviour according to the

C standard (6.5.7, Bitwise shift operators). To portably support n ==

0, one has to write the rotate as something like



  (x << n) | (x >> ((-n) & 31))



And this is apparently not recognized by gcc. I compiled this test

program with "gcc -O3 -c -save-temps rot.c". Using gcc-4.6.3 on

GNU/Linux x86_64 (ubuntu):



typedef unsigned int uint32_t;



/* Allows 0 < n < 32 (n == 0 gives undefined behaviour) */

uint32_t

rot1(unsigned n, uint32_t x)

{

  return (x << n) | (x >> (32 - n));

}



/* Allows 0 <= n < 32 */

uint32_t

rot2(unsigned n, uint32_t x)

{

  return (x << n) | (x >> ((- n) & 31));

}



Generated assembler



    .file    "rot.c"

    .text

    .p2align 4,,15

    .globl    rot1

    .type    rot1, @function

rot1:

.LFB0:

    .cfi_startproc

    movl    %esi, %eax

    movl    %edi, %ecx

    roll    %cl, %eax

    ret

    .cfi_endproc

.LFE0:

    .size    rot1, .-rot1

    .p2align 4,,15

    .globl    rot2

    .type    rot2, @function

rot2:

.LFB1:

    .cfi_startproc

    movl    %edi, %ecx

    movl    %esi, %eax

    negl    %ecx

    shrl    %cl, %eax

    movl    %edi, %ecx

    sall    %cl, %esi

    orl    %esi, %eax

    ret

    .cfi_endproc

.LFE1:

    .size    rot2, .-rot2

    .ident    "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"

    .section    .note.GNU-stack,"",@progbits



The desired result is of course to get a rotl instruction also for

rot2, instead of the combination of negl, shrl, sall, and orl.



Applying the above portability fix to my ROTL32 macro in GNU Nettle

results in a slowdown of almost 20% for cast128. This function depends

a lot on key-dependant rotates, where rotation counts of zero will

happen for some keys.

Reply via email to