[ was: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug ]

On 01/24/2018 11:41 AM, Tom de Vries wrote:
Hi,

this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0".


When compiling a branch-around-nothing (where the branch is warp neutering, so it's a divergent branch):
...
   .reg .pred %r36;
   {
     .reg .u32 %x;
     mov.u32 %x,%tid.x;
     setp.ne.u32 %r36,%x,0;
   }

   @ %r36 bra $L5;
   $L5:
...

The JIT fails to generate a convergence point here:
...
          /*0128*/               @P0 BRA `(.L_1);
.L_1:
...

Consequently, we execute subsequent code in divergent mode, and when executing a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has when executing in divergent mode.

The workaround detects branch-around-nothing, and inserts a ptx operation that does nothing (I'm calling it a fake nop, I haven't been able to come up with a better term yet):
...
   @ %r36 bra $L5;
     {
       .reg .u32 %nop_src;
       .reg .u32 %nop_dst;
       mov.u32 %nop_dst, %nop_src;
     }
   $L5:
...
which makes the test pass, because then we generate a convergence point here at .L1:
...
         /*0128*/                   SSY `(.L_1);
         /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
         /*0138*/                   SYNC (*"TARGET= .L_1 "*);
.L_1:
...

The workaround is not minimal given that it inserts the fake nop in all branch-around-nothings it detects, not just the warp neutering ones, but I think this is more robust than trying to identify the warp neutering branches. Furthermore, I'm not going for optimality here anyway. The optimal way to fix this is making sure we don't generate branch-around-nothing, but that's for stage1.

Build and reg-tested on x86_64 with nvptx accelerator.

I'd like to commit in stage4, but I'd appreciate a review of the code. Does the patch look OK?

Thanks,
- Tom

0002-nvptx-PR83589-Workaround-for-branch-around-nothing-JIT-bug.patch


[nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-23  Tom de Vries  <t...@codesourcery.com>

        PR target/83589
        * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
        (nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
        Add strict parameter.
        (prevent_branch_around_nothing): Insert dummy insn between branch to
        label and label with no ptx insn inbetween.
        * config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.

        * testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.

---
  gcc/config/nvptx/nvptx.c                           | 92 ++++++++++++++++++++++
  gcc/config/nvptx/nvptx.md                          |  9 +++
  .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
  3 files changed, 122 insertions(+)


+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = 0;
+    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      {
+       if (seen_label == 0)
+         {
+           if (INSN_P (insn) && condjump_p (insn))
+             seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+           continue;
+         }
+
+       if (NOTE_P (insn))
+         continue;
+
+       if (INSN_P (insn))
+         switch (recog_memoized (insn))
+           {
+           case CODE_FOR_nvptx_fork:
+           case CODE_FOR_nvptx_forked:
+           case CODE_FOR_nvptx_joining:
+           case CODE_FOR_nvptx_join:
+             continue;
+           default:
+             seen_label = 0;
+             continue;
+           }
+
+       if (LABEL_P (insn) && insn == seen_label)
+         emit_insn_before (gen_fake_nop (), insn);
+
+       seen_label = 0;
+      }
+  }

Consider testcase:
...
int
main (void)
{
  int a[10];
#pragma acc parallel loop worker
  for (int i = 0; i < 10; i++)
    a[i] = i;

  return 0;
}
...

At -O2, we generate this, and fail to generate a fake nop:
...
  @ %r34 bra.uni $L8;
  @ %r33 bra $L9;
  // join 2;
 $L9:
 $L8:
...

What is happening in prevent_branch_around_nothing is:
- seen_label is NULL
- we process "@ %r34 bra.uni $L8" and seen_label becomes $L8
- we process "@ %r33 bra $L9" and since seen_label != NULL, we end up in
  the default case in the switch and reset seen_label to NULL
- we process the labels, seen_label remains NULL, and no fake nop is
  generated

What we want to happen instead, is that when processing "@ %r33 bra $L9", seen_label is updated to $L9. Patch below implements that.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to stage4 trunk.

Thanks,
- Tom
[nvptx] Fix prevent_branch_around_nothing

2018-03-20  Tom de Vries  <t...@codesourcery.com>

	PR target/84954
	* config/nvptx/nvptx.c (prevent_branch_around_nothing): Also update
	seen_label if seen_label is already set.

---
 gcc/config/nvptx/nvptx.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a6f4443..7b0b182 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4419,14 +4419,15 @@ prevent_branch_around_nothing (void)
   rtx_insn *seen_label = NULL;
     for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
       {
-	if (seen_label == NULL)
+	if (INSN_P (insn) && condjump_p (insn))
 	  {
-	    if (INSN_P (insn) && condjump_p (insn))
-	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
-
+	    seen_label = label_ref_label (nvptx_condjump_label (insn, false));
 	    continue;
 	  }
 
+	if (seen_label == NULL)
+	  continue;
+
 	if (NOTE_P (insn) || DEBUG_INSN_P (insn))
 	  continue;
 

Reply via email to