The following patch adds support for three-input addition instructions to the nvptx backend. The PTX ISA's "vadd.u32.u32.u32.add d, a, b, c" instruction effectively implements 32-bit d = a+b+c, and the "vsub.u32.u32.u32 d,a,b,c" instruction that provides 32-bit d = (a-b)+c. The hope is that these mnemonics help ptxas generate the low-level hardware's IADD3 instruction.
Tested by "make" and "make -k check" on --build=nvptx-none hosted on x86_64-pc-linux-gnu with no new regressions. [PATCH] nvptx: Add support for vadd.add and vsub.add instructions 2020-07-03 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog: * config/nvptx/nvptx.md (vadd_addsi4): New instruction. (vsub_addsi4): New instruction. gcc/testsuite/ChangeLog: * gcc.target/nvptx/vadd_add.c: New test. * gcc.target/nvptx/vsub_add.c: New test. Hopefully, I've got the patch/diff file format correct this time. Ok for mainline? Thanks in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK -----Original Message----- From: Tom de Vries <tdevr...@suse.de> Sent: 02 July 2020 14:29 To: Roger Sayle <ro...@nextmovesoftware.com>; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] nvptx: : Add support for popcount and widening multiply instructions On 7/1/20 3:06 PM, Roger Sayle wrote: > > The following patch adds support for the popc and mul.wide instructions to > the nvptx backend. > I've a follow-up patch for supporting mul.hi instructions, but those > changes require some minor tweaks to GCC's middle-end, so I'll submit those > pieces separately. > > Tested by "make" and "make -k check" on --build=nvptx-none hosted on > x86_64-pc-linux-gnu with no new regressions. > > 2020-07-01 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog: > * config/nvptx/nvptx.md (popcount<mode>2): New instructions. > (mulhishi3, mulsidi3, umulhisi3, umulsidi3): New instructions. > > gcc/testsuite/ChangeLog: > * gcc.target/nvptx/popc-1.c: New test. > * gcc.target/nvptx/popc-2.c: New test. > * gcc.target/nvptx/popc-3.c: New test. > * gcc.target/nvptx/mul-wide.c: New test. > * gcc.target/nvptx/umul-wide.c: New test. > > > Ok for mainline? > Hi Roger, LGTM, please apply. [ Btw, can you next time add the new files to the patch. That's somewhat more convenient to apply. ] Thanks - Tom
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 5ceeac7..11d1d35 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -373,6 +373,22 @@ "" "%.\\tadd%t0\\t%0, %1, %2;") +(define_insn "vadd_addsi4" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (plus:SI (plus:SI (match_operand:SI 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_register_operand" "R")) + (match_operand:SI 3 "nvptx_register_operand" "R")))] + "" + "%.\\tvadd%t0%t1%t2.add\\t%0, %1, %2, %3;") + +(define_insn "vsub_addsi4" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (plus:SI (minus:SI (match_operand:SI 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_register_operand" "R")) + (match_operand:SI 3 "nvptx_register_operand" "R")))] + "" + "%.\\tvsub%t0%t1%t2.add\\t%0, %1, %2, %3;") + (define_insn "sub<mode>3" [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") (minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") diff --git a/gcc/testsuite/gcc.target/nvptx/vadd_add.c b/gcc/testsuite/gcc.target/nvptx/vadd_add.c new file mode 100644 index 0000000..dcb2394 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/vadd_add.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo(int x, int y, int z) +{ + return x + y + z; +} + +unsigned int bar(unsigned int x, unsigned int y, unsigned int z) +{ + return x + y + z; +} + +/* { dg-final { scan-assembler-times "vadd.u32.u32.u32.add" 2 } } */ + diff --git a/gcc/testsuite/gcc.target/nvptx/vsub_add.c b/gcc/testsuite/gcc.target/nvptx/vsub_add.c new file mode 100644 index 0000000..3f632c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/vsub_add.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo(int x, int y, int z) +{ + return (x - y) + z; +} + +int bar(int x, int y, int z) +{ + return x + (y - z); +} + +unsigned int ufoo(unsigned int x, unsigned int y, unsigned int z) +{ + return (x - y) + z; +} + +unsigned int ubar(unsigned int x, unsigned int y, unsigned int z) +{ + return x + (y - z); +} + +/* { dg-final { scan-assembler-times "vsub.u32.u32.u32.add" 4 } } */ +