https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104440

            Bug ID: 104440
           Summary: nvptx: FAIL: gcc.c-torture/execute/pr53465.c
                    execution test
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

With nvptx target, driver version 510.47.03 and board GT 1030 I run into:
...
FAIL: gcc.c-torture/execute/pr53465.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr53465.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr53465.c   -O3 -g  execution test
...

Passes with nvptx-none-run -O0:
...
$ ( export NVPTX_NONE_RUN="$(pwd -P)/install/bin/nvptx-none-run -O0" ;
./test.sh )
                === gcc Summary ===

# of expected passes            12
$
...

I can minimize it at -O1 to:
...
void __attribute__((noinline, noclone))
foo (int y)
{
  int i;
  int c;
  for (i = 0; i < y; i++)
    {
      int d = i + 1;
      if (i && d <= c)
        __builtin_abort ();
      c = d;
    }
}

int
main ()
{
  foo (2);
  return 0;
}
...

I can make the test pass by initializing c with any value (or by doing the
equivalent at ptx level).

Note that the read of c in the loop body only happens in the second iteration,
by which time it's initialized, so the example is valid.

Gcc however translates this at gimple level to:
...
    _1 = i != 0;
    _2 = d <= c;
    _3 = _1 & _2;
...
which does imply a read of c while it's undefined.

We can prevent this by using --param=logical-op-non-short-circuit=0, and that
makes the minimized example pass.  But not the original example.

If we translate the example into cuda, we see that the loop's first iteration
is peeled off, even at -O0.  This has the effect that there are two "d <= c"
tests.  The first one has an undefined input, but is dead code.  The second one
has its inputs defined on both loop entry and backedge.

We could try to report this to nvidia, but I'm not sure they want to fix this.
They've pushed back on examples involving reads from uninitialized regs before,
and looking at what cuda does, it seems they try to ensure this invariant.

Unfortunately, pass_initialize_regs does not insert the required init.

So, it looks like we'll have to fix this in the compiler.

Reply via email to