Andrew Stubbs wrote:
Unified Shared Memory does not appear to work well on gfx908, which is why we
disabled xnack by default.  For this reason it makes sense to inform the user
as compile time, but this is causing trouble in the testsuite which assumes
that USM only fails at runtime.

This patch changes the gfx908 compile time message to a warning only (in case
some other target does this differently), and prevents the tests from
attempting to run in host-fallback mode (given that that is not what they are
trying to test).  It also changes the existing warning to only fire once.

The patch assumes that effective target "omp_usm" also implies self-maps.
...
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
      {
        warning_at (UNKNOWN_LOCATION, 0,
                  "Unified Shared Memory is enabled, but XNACK is disabled");
        inform (UNKNOWN_LOCATION, "Try -foffload-options=-mxnack=any");
+      warned_xnack = 1;
      }

Wording wise, I am wondering whether it should be
 "Unified Shared Memory requested" or "… required" instead of "enabled"?

[Two side remarks to bystanders:

* The hint (inform/"note:") won't help with gfx908 – bad luck!
  (xnack is always disabled on gfx908, as Andrew wrote).

* Warning only once is more useful than it seems as even relatively
  simple programs (libgomp/testsuite size) could warn tens if not hundreds
  of times!

* * *

+++ b/gcc/config/gcn/mkoffload.cc
@@ -627,9 +627,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile, uint32_t 
omp_requires)
                      || TEST_XNACK_ON (elf_flags)
                      || xnack_required);
    if (TEST_XNACK_OFF (elf_flags) && xnack_required)
-    fatal_error (input_location,
-                "conflicting settings; XNACK is forced off but Unified "
-                "Shared Memory is on");
+    {
+      warning (input_location,
+              "conflicting settings; XNACK is forced off but Unified "
+              "Shared Memory is on");
+      xnack_required = 0;
+    }

Remarks:

* IMHO, 'on' is better than 'enabled', but still it might be easier
  to understand with 'required' or 'requested'.

* Warning vs. error: One can argue whether one or the other is better.
  If USM is not supported, host fallback is used. Thus, permitting the
  compilation is fine - even if it is not usable. (And "just" warning
  about it.) On the other hand, an error is also fine - as the user
  can still make it compile with -foffload=disabled (or only nvptx-none)
  or by switching to a USM-supporting -march=...

* * *

+++ b/libgomp/testsuite/lib/libgomp.exp
+# return 1 if OpenMP Unified Shared Memory is supported

I have two small issues with it: (A) It is always supported
by only using the host (e.g. with no device present or only
devices for which GCC does not handle the requirement).
(B) The check does not look at actual capabilities but at the
requirement check.

Thus, I think you should address (A), e.g. by appending:
"by non-host devices" or "by offload devices".

Maybe even adding 'requirement' on top as in:

# return 1 if the OpenMP requirement Unified Shared Memory
# is supported by at least one non-host device

+proc check_effective_target_omp_usm { } {
+    if { [check_effective_target_offload_device_nvptx] } {
+       return 1
+    }
+
+    if { [libgomp_check_effective_target_offload_target "amdgcn"] } {
+       if [check_runtime_nocache usm_available_ {
+           #include <omp.h>
+           #pragma omp requires unified_shared_memory
+           int main ()
+           {
+             int a;
+             #pragma omp target map(from: a)
+               a = omp_is_initial_device ();
+             return a;
+           }
+       } ] {
+         return 1
+       }
+    }
+
+    return 0
+}

You have to do the same check for Nvidia devices. Contrary to
managed-memory support, not all Nvidia devices support USM.

* * *

--- a/libgomp/testsuite/libgomp.c++/target-std__array-concurrent-usm.C
+++ b/libgomp/testsuite/libgomp.c++/target-std__array-concurrent-usm.C
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target omp_usm } */
  #pragma omp requires unified_shared_memory self_maps
#define MEM_SHARED

If anyone wonders about the host fallback: The non '-usm' testcase
checks whether a stack variable is self mapped by default - if it
is (i.e. host fallback), it is compiled with '-DMEM_SHARED.
Otherwise, it assumes mapping happens.

Thus, there is no need to also run the '-usm' testcase with host
fallback. And if there is no automatic self mapping, but supported,
it makes sense to run it again - but that's what happens with the
added effective target check.

* * *

Thanks,

Tobias

Reply via email to