On Mon, Nov 15, 2021 at 2:54 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch teaches the i386 backend to avoid using BMI2's rorx
> instructions when optimizing for size.  The benefits are shown
> with the following example:
>
> unsigned int ror1(unsigned int x) { return (x >> 1) | (x << 31); }
> unsigned int ror2(unsigned int x) { return (x >> 2) | (x << 30); }
> unsigned int rol2(unsigned int x) { return (x >> 30) | (x << 2); }
> unsigned int rol1(unsigned int x) { return (x >> 31) | (x << 1); }
>
> which currently with -Os -march=cascadelake generates:
>
> ror1:   rorx    $1, %edi, %eax          // 6 bytes
>         ret
> ror2:   rorx    $2, %edi, %eax          // 6 bytes
>         ret
> rol2:   rorx    $30, %edi, %eax         // 6 bytes
>         ret
> rol1:   rorx    $31, %edi, %eax         // 6 bytes
>         ret
>
> but with this patch now generates:
>
> ror1:   movl    %edi, %eax              // 2 bytes
>         rorl    %eax                    // 2 bytes
>         ret
> ror2:   movl    %edi, %eax              // 2 bytes
>         rorl    $2, %eax                // 3 bytes
>         ret
> rol2:   movl    %edi, %eax              // 2 bytes
>         roll    $2, %eax                // 3 bytes
>         ret
> rol1:   movl    %edi, %eax              // 2 bytes
>         roll    %eax                    // 2 bytes
>         ret
>
> I've confirmed that this patch is a win on the CSiBE benchmark,
> even though rotations are rare, where for example libmspack/test/md5.o
> shrinks from 5824 bytes to 5632 bytes.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2021-11-15  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*bmi2_rorx<mode3>_1): Make conditional
>         on !optimize_function_for_size_p.
>         (*<any_rotate><mode>3_1): Add preferred_for_size attribute.
>         (define_splits): Conditionalize on !optimize_function_for_size_p.
>         (*bmi2_rorxsi3_1_zext): Likewise.
>         (*<any_rotate>si2_1_zext): Add preferred_for_size attribute.
>         (define_splits): Conditionalize on !optimize_function_for_size_p.

OK.

Thanks,
Uros.

Reply via email to