On Thu, 21 Jul 2016, Thomas Schwinge wrote:
> Hmm.  In an offloading configuration I see the following regression:

First of all: sorry about this (bah, this is fairly embarrassing, while I forgot
to check offloading, I should have seen the fallout in check-c testing; might
have tested the wrong source checkout).

>     [-PASS:-]{+FAIL:+}
>     libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
>     -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0
[snip]
>     ptxas /tmp/ccmABlIh.o, line 1408; error   : Label expected for forward
>     reference of '__reduction_lock'

This is due to the following:

1. VAR_DECL for __reduction_lock is created on-demand when offloaded functions
are compiled.
2. output_in_order does not care if new declarations appear while
already-existing function bodies are emitted.  It's followed by
process_new_functions that handles only functions.
3. Normally process_pending_assemble_externals would have handled this, but the
nvptx backend doesn't use ASM_OUTPUT_EXTERNAL (:/, I wonder why), so it's a
no-op there.

> C:
> 
>     [-PASS:-]{+WARNING: program timed out.+}
>     {+FAIL:+} gcc.dg/large-size-array-4.c (test for excess errors)
> 
> This one times out when creating a huge *.s file:
> 
>     [...]
>             .global .align 8 .u64 name[2147483649] = { 0, 0, [...]
> 
> Running with -ftoplevel-reorder (as implicitly enabled before), a *.s
> file with just the preamble gets created.  Are these behaviors correct?

Yes, with -ftoplevel-reorder the huge variable is not emitted because it's
unused.  In PTX there's no way to emit such a huge initializer without spelling
it all out, so the behavior after the patch is expected.

I think the testcase uncovers a flaw in GCC's behavior: it tests for an error
diagnostic, but it wouldn't be emitted on any 32-bit platform with
-ftoplevel-reorder. So, while as a quick-fix it's possible to skip this test on
NVPTX, it's better to make GCC diagnose this consistently and run the testcase
with -fsyntax-only to avoid emitting the huge initializer.

>     [-PASS:-]{+FAIL:+} gcc.dg/pr16973.c (test for excess errors)

This test needs "label_values" dg-require-effective-target annotation.

> C++:
[snip]
>     [-PASS:-]{+FAIL:+} g++.dg/debug/dwarf2/dwarf4-typedef.C  -std=gnu++14 
> (test for excess errors)
> 
> These now all FAIL due to "sorry, unimplemented: target cannot support
> nonlocal goto", which is "acceptable", but should be XFAILed.  (Though,
> that has generally not yet been done for C++.)

Nonlocal gotos appear where a possibly-throwing call is followed by a destructor
invocation (in other words, where unwinding after an exception is thrown needs
to invoke destructors). Here -ftoplevel-reorder affects whether the compiler is
able to deduce some calls as non-throwing (even at -O0).

As NVPTX doesn't support C++ exceptions at all, you must be seeing an extreme
amount of similar failure already.

>     [-PASS:-]{+FAIL:+} g++.dg/cpp1y/nsdmi-aggr1.C  -std=c++14 (test for 
> excess errors)
> 
> Now fails with:
> 
>     nvptx-as: circular reference in variable initializers

Here -ftoplevel-reorder affects whether a static variable that needs a
self-recursive initializer is emitted.  This wouldn't work on NVPTX and passed
by luck previously (if the variable wasn't unused, it would fail).

> More "sorry, unimplemented: target cannot support nonlocal goto".  But
> why did these PASS before?  Is this generally just a problem with -O0, or
> are there any optimizations doing things differently whether
> -ftoplevel-reorder is in effect or not, and we could perhaps shuffle some
> things so that for nvptx we run into "sorry, unimplemented: target cannot
> support nonlocal goto" less often?  That is, I'm not clear on how
> -fno-toplevel-reorder interacts with "nonlocal goto".

I've provided an explanation above, but to put it in different words,
-ftoplevel-reorder is a form of optimization (enabled by default except at -O0),
and on nvptx it was implicitly enabled at -O0 previously, allowing some cleanups
to happen and thus concealing some issues. The biggest of those, "cannot support
nonlocal goto", is due to no possibility of C++ exceptions on NVPTX.

Did you consider enabling -fno-exceptions by default for nvptx?

>     [-PASS:-]{+FAIL:+} g++.old-deja/g++.pt/crash55.C  -std=c++14 (test for 
> excess errors)
> 
> These now fail with:
> 
>     ptxas crash55.o, line 61; error   : Arguments mismatch for instruction 
> 'mov'
>     ptxas crash55.o, line 63; error   : Arguments mismatch for instruction 
> 'mov'
>     ptxas crash55.o, line 80; error   : Label expected for argument 0 of 
> instruction 'call'
>     ptxas crash55.o, line 80; error   : Function '_ZN3fooIcE1dEPS0_' not 
> declared in this scope

This is due to a combination of bogus sjlj exception emission and a bug in
nvptx-as that is not ready for such bogosity.

I hope I've satisfactorily explained the failures you've pointed out (thanks for
the data).  I think I should leave the choice of what to do next (revert the
patch or leave it in and install fixups where appropriate) up to you?

Thanks.
Alexander

Reply via email to