On Mon, Nov 15, 2021 at 2:54 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch teaches the i386 backend to avoid using BMI2's rorx > instructions when optimizing for size. The benefits are shown > with the following example: > > unsigned int ror1(unsigned int x) { return (x >> 1) | (x << 31); } > unsigned int ror2(unsigned int x) { return (x >> 2) | (x << 30); } > unsigned int rol2(unsigned int x) { return (x >> 30) | (x << 2); } > unsigned int rol1(unsigned int x) { return (x >> 31) | (x << 1); } > > which currently with -Os -march=cascadelake generates: > > ror1: rorx $1, %edi, %eax // 6 bytes > ret > ror2: rorx $2, %edi, %eax // 6 bytes > ret > rol2: rorx $30, %edi, %eax // 6 bytes > ret > rol1: rorx $31, %edi, %eax // 6 bytes > ret > > but with this patch now generates: > > ror1: movl %edi, %eax // 2 bytes > rorl %eax // 2 bytes > ret > ror2: movl %edi, %eax // 2 bytes > rorl $2, %eax // 3 bytes > ret > rol2: movl %edi, %eax // 2 bytes > roll $2, %eax // 3 bytes > ret > rol1: movl %edi, %eax // 2 bytes > roll %eax // 2 bytes > ret > > I've confirmed that this patch is a win on the CSiBE benchmark, > even though rotations are rare, where for example libmspack/test/md5.o > shrinks from 5824 bytes to 5632 bytes. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? > > > 2021-11-15 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (*bmi2_rorx<mode3>_1): Make conditional > on !optimize_function_for_size_p. > (*<any_rotate><mode>3_1): Add preferred_for_size attribute. > (define_splits): Conditionalize on !optimize_function_for_size_p. > (*bmi2_rorxsi3_1_zext): Likewise. > (*<any_rotate>si2_1_zext): Add preferred_for_size attribute. > (define_splits): Conditionalize on !optimize_function_for_size_p.
OK. Thanks, Uros.