Hi Richard, As requested here are some additional test cases for this patch, derived from the examples in my post. For minmax-8 and minmax-9, the tests are run at -Os, to avoid limiting code generation options (for speed) on future architectures, i.e. CSE may or may not always be a win for performance, but it is always smaller. Tested on x86_64-pc-linux-gnu (with the above patch).
2020-08-03 Roger Sayle <ro...@nextmovesoftware.com> gcc/testsuite/ChangeLog * gcc.target/i386/minmax-8.c: New test. * gcc.target/i386/minmax-9.c: New test. * gcc.target/i386/minmax-10.c: New test. * gcc.target/i386/minmax-11.c: New test. Cheers, Roger -- -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: 30 July 2020 13:21 To: Roger Sayle <ro...@nextmovesoftware.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak <ubiz...@gmail.com> Subject: Re: [PATCH] x86_64: Integer min/max improvements. On Thu, Jul 30, 2020 at 1:24 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch tweaks the way that min and max are expanded, so that the > semantics of these operations are visible to the early RTL > optimization passes, until split into explicit comparison and > conditional move instructions. The good news is that i386.md already > contains all of the required logic (many thanks to Richard Biener and > Uros Bizjak), but this is currently only to enable scalar-to-vector > (STV) synthesis of min/max instructions. This change enables this > functionality for all TARGET_CMOVE architectures for SImode, HImode and > DImode. > > My first attempt to support "min(reg,reg)" as an instruction revealed > why this functionality isn't already enabled: In PR rtl-optimization > 91154 we end up generating a cmp instead of a test in > gcc.target/i386/minmax-3.c which has poor performance on AMD Opteron. > My solution to this is to actually support "min(reg,general_operand)" > allowing us to inspect any immediate constant at the point this > operation is split. This allows us to use "test" instructions for min/max > against 0, 1 and -1. > As an added benefit it allows us to CSE large immediate constants, > reducing code size. > > Previously, "int foo(int x) { return max(x,12345); }" would generate: > > foo: cmpl $12345, %edi > movl $12345, %eax > cmovge %edi, %eax > ret > > with this patch we instead generate: > > foo: movl $12345, %eax > cmpl %eax, %edi > cmovge %edi, %eax > ret > > > I've also included a peephole2 to avoid the "movl $0,%eax" > instructions being produced by the register allocator. Materializing > constants as late as possible reduces register pressure, but for > const0_rtx on x86, it's preferable to use "xor" by moving this set > from between a flags setting operation and its use. This should also > help instruction macro fusion on some microarchitectures, where > test/cmp and the following instruction can sometimes be combined. > > Previously, "int foo(int x) { return max(x,0); }" would generate: > > foo: testl %edi, %edi > movl $0, %eax > cmovns %edi, %eax > ret > > with this patch we instead generate: > foo: xorl %eax, %eax > testl %edi, %edi > cmovns %edi, %eax > ret > > The benefits of having min/max explicit at the RTL level can be seen > from compiling the following example with "-O2 -fno-tree-reassoc". > > > #define max(a,b) (((a) > (b))? (a) : (b)) > > int foo(int x) > { > int y = max(x,5); > return max(y,10); > } > > where GCC currently produces: > > foo: cmpl $5, %edi > movl $5, %eax > movl $10, %edx > cmovge %edi, %eax > cmpl $10, %eax > cmovl %edx, %eax > ret > > and with this patch it instead now produces: > > foo: movl $10, %eax > cmpl %eax, %edi > cmovge %edi, %eax > ret > > > The original motivation was from looking at a performance critical > function in a quantum mechanics code, that performed MIN_EXPR and > MAX_EXPR of the same arguments (effectively a two-element sort), where > GCC was performing the comparison twice. I'd hoped that it might be > possible to fuse these together, perhaps in combine, but this patch is > an intermediate step towards that goal. > > This patch has been tested on x86_64-pc-linux-gnu with a make > bootstrap followed by make -k check with no new regressions. > Ok for mainline? May I ask for a testcase or two? You have good examples above. Btw, I'm sure some variants of those are in bugzilla as well. Richard. > > 2020-07-30 Roger Sayle <ro...@nextmovesoftware.com> > > * config/i386/i386.md (MAXMIN_IMODE): No longer needed. > (<maxmin><mode>3): Support SWI248 and general_operand for > second operand, when TARGET_CMOVE. > (<maxmin><mode>3_1 splitter): Optimize comparisons against > 0, 1 and -1 to use "test" instead of "cmp". > (*<maxmin>di3_doubleword): Likewise, allow general_operand > and enable on TARGET_CMOVE. > (peephole2): Convert clearing a register after a flag setting > instruction into an xor followed by the original flag setter. > > > Many thanks in advance. Almost all of the credit goes to Uros and > Richard for already implementing this feature, I've just copied the > transformations from optab expansion, that allow it to be enabled > without penalty (this late in the compilation). > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
diff --git a/gcc/testsuite/gcc.target/i386/minmax-8.c b/gcc/testsuite/gcc.target/i386/minmax-8.c new file mode 100644 index 0000000..1f7e466 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/minmax-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#define max(a,b) (((a) > (b))? (a) : (b)) +#define min(a,b) (((a) < (b))? (a) : (b)) + +int foo(int x) +{ + return max(x,12345); +} + +int bar(int x) +{ + return min(x,87654); +} + +/* { dg-final { scan-assembler-times "12345" 1 } } */ +/* { dg-final { scan-assembler-times "87654" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/minmax-9.c b/gcc/testsuite/gcc.target/i386/minmax-9.c new file mode 100644 index 0000000..3b94023 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/minmax-9.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#define max(a,b) (((a) > (b))? (a) : (b)) +#define min(a,b) (((a) < (b))? (a) : (b)) + +int foo(int x) +{ + return max(x,0); +} + +int bar(int x) +{ + return min(x,0); +} + +unsigned int baz(unsigned int x) +{ + return min(x,1); +} + +/* { dg-final { scan-assembler-times "xor" 3 } } */ +/* { dg-final { scan-assembler-times "test" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/minmax-10.c b/gcc/testsuite/gcc.target/i386/minmax-10.c new file mode 100644 index 0000000..b044462 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/minmax-10.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define max(a,b) (((a) > (b))? (a) : (b)) +#define min(a,b) (((a) < (b))? (a) : (b)) + +int smax1(int x) +{ + return max(x,1); +} + +int smin1(int x) +{ + return min(x,1); +} + +int smaxm1(int x) +{ + return max(x,-1); +} + +int sminm1(int x) +{ + return min(x,-1); +} + +unsigned int umax1(unsigned int x) +{ + return max(x,1); +} + +unsigned int umin1(unsigned int x) +{ + return min(x,1); +} + +/* { dg-final { scan-assembler-times "test" 6 } } */ +/* { dg-final { scan-assembler-not "cmp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/minmax-11.c b/gcc/testsuite/gcc.target/i386/minmax-11.c new file mode 100644 index 0000000..a8c2df5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/minmax-11.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-reassoc" } */ + +#define max(a,b) (((a) > (b))? (a) : (b)) + +int foo(int x) +{ + int y = max(x,12345); + return max(y,87654); +} + +/* { dg-final { scan-assembler-not "12345" } } */