https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104440
Bug ID: 104440 Summary: nvptx: FAIL: gcc.c-torture/execute/pr53465.c execution test Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- With nvptx target, driver version 510.47.03 and board GT 1030 I run into: ... FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test ... Passes with nvptx-none-run -O0: ... $ ( export NVPTX_NONE_RUN="$(pwd -P)/install/bin/nvptx-none-run -O0" ; ./test.sh ) === gcc Summary === # of expected passes 12 $ ... I can minimize it at -O1 to: ... void __attribute__((noinline, noclone)) foo (int y) { int i; int c; for (i = 0; i < y; i++) { int d = i + 1; if (i && d <= c) __builtin_abort (); c = d; } } int main () { foo (2); return 0; } ... I can make the test pass by initializing c with any value (or by doing the equivalent at ptx level). Note that the read of c in the loop body only happens in the second iteration, by which time it's initialized, so the example is valid. Gcc however translates this at gimple level to: ... _1 = i != 0; _2 = d <= c; _3 = _1 & _2; ... which does imply a read of c while it's undefined. We can prevent this by using --param=logical-op-non-short-circuit=0, and that makes the minimized example pass. But not the original example. If we translate the example into cuda, we see that the loop's first iteration is peeled off, even at -O0. This has the effect that there are two "d <= c" tests. The first one has an undefined input, but is dead code. The second one has its inputs defined on both loop entry and backedge. We could try to report this to nvidia, but I'm not sure they want to fix this. They've pushed back on examples involving reads from uninitialized regs before, and looking at what cuda does, it seems they try to ensure this invariant. Unfortunately, pass_initialize_regs does not insert the required init. So, it looks like we'll have to fix this in the compiler.