On 11/11/2025 21:35, Tobias Burnus wrote:
Hi Andrew,

Andrew Stubbs:
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

Namely, for GPU code that is compiled with either xnack=on or with xnack=any and additionally requires USM/self_maps, it makes sense to have HSA_XNACK=1 set; the first one for compatibility, the second one to actually make use of USM.

(There are other GPUs that do not support xnack=…, some of them are APUs and still support USM, others aren't and also don't support USM.)

Other toolchains require the end-user to set HSA_XNACK manually (or else wonder
why it's not working),

Some other tool chains also permit compiling a "fat binary" with multiple variants compiled in, permitting e.g. mixing a xnack+ and a xnack- binary. – In that case, the approach as taken here won't work in general.

For a non-fat binary, having the HSA_XNACK set by the compiler/runtime still seems to make sense.

* * *

so the constructor also checks that any existing manual
setting is compatible with the binary's requirements.

Can you also update: https://gcc.gnu.org/onlinedocs/libgomp/AMD- Radeon.html – search for HSA_XNACK.

That's only for the USM case, but I think that's the only one that really matters. (Even if xnack+ / on is also affected.)

OK, I'll add that to my todo list.


* * *

+  if ((omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY)

Can you also check for OMP_REQUIRES_SELF_MAPS? Namely:

Done.

   omp_requires_mask & (OMP_REQUIRES_UNIFIED_SHARED_MEMORY
                        | OMP_REQUIRES_SELF_MAPS)

+  if (flag_xnack == HSACO_ATTR_ON
+      || (omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+      && gcn_devices[gcn_arch].xnack_default != HSACO_ATTR_UNSUPPORTED))

Likewise, you probably want to add the self mapping flag here as well.

Done.

+    /* If a device supports XNACK then it will be needed for USM.  */
+    fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: XNACK+\n");
  }

BTW: I am not saying that it is better, but all the information
should be already available in mkoffload itself.

I think this patch dates to a time before you added the omp_requires feature. (I might also ask why you didn't choose to use the established method of passing information to mkoffload -- see "OPENACC-DIMS" -- but that's another story).

When debugging data is produced, mkoffload already needs to produce
an elf file with proper eflags - hence, we know which XNACK is used
andomp_requires is available to the compiler itself. Hence, alls TEST_XNACK_UNSET etc. flags are available. * * *

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 222adb2cd41..c9bf13c2da5 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
...
+        if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: XNACK+%c", &dummy) > 0)
+          xnack_required = true;
+
...
+  bool use_xnack = (!TEST_XNACK_ANY (elf_flags) || xnack_required);

If you already use TEST_XNACK_ANY, why don't you pass the existing
'omp_requires' variable as additional argument to 'process_asm'
and just use it?

It seems as if the only information you currently use from the
device side is the requires part – as the XNACK part also is
available on the host side.

The new patch uses omp_requires. The XNACK status is not enough on its own as the "any" flag could go either way. If the XNACK eflag is set on/off then we want the HSA_XNACK to match. If we require USM then we want HSA_XNACK set, even with -mxnack=any. If -mxnack doesn't match omp_requires then we should already have a warning from the compiler (I had it as an error before), but it's now an error in mkoffload. Undefined is fine though, so we should be good on gfx1036, I think.

* * *

I had expected somethink like:

use_xnack = ((!TEST_XNACK_ANY () && !TEST_XNACK_UNSET())
              || omp_requires & (self|USM));

here, note not only the use of the omp_requires but also
the exclusion of 'unsupported'. (I think unset is effectively
unset as that has 0x000 while off/any/on have 0x100, 0x200, 0x300,
respectively.)

* * *

+  if (TEST_XNACK_OFF (elf_flags) && xnack_required)
+    fatal_error (input_location,
+         "conflicting settings; XNACK is forced off but Unified "
+  if (!TEST_XNACK_ANY (elf_flags) || xnack_required)

Why do you define use_xnack above if you replicate the condition here?

I missed this after fixing a merge conflict; fixed now.


And: I think there is no point to check the HSA_XNACK value or to set
if a GPUs that don't support XNACK ...

+    fprintf (cfile,
+         "\n"
+         "  const char *xn_var = getenv (\"HSA_XNACK\");\n"
+         "  if (!xn_var || xn_var[0] == '\\0')\n"
+         "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+         "  else if (%s)\n"
+         "    fprintf (stderr, \"warning: HSA_XNACK=%%s is incompatible; "
+               "the GPU kernel may revert to host fall-back\\n\", "
+               "xn_var);\n",
+         xnack_required || TEST_XNACK_ON (elf_flags),

OK, it seems as if you want to force set HSA_XNACK=0 for xnack-,
based on the idea that the default can be overridden at boot time
such that no HSA_XNACK not always means '0' but also can mean '1'.

I think this deserves a comment.

+         (xnack_required || TEST_XNACK_ON (elf_flags)
+          ? "xn_var[0] != '1' || xn_var[1] != '\\0'"
+          : "xn_var[0] == '1' && xn_var[1] == '\\0'"));

I think it make sense to make it symmetric. Namely:

+          ? "xn_var[0] != '1' || xn_var[1] != '\\0'"
+          : "xn_var[0] != '0' || xn_var[1] != '\\0'"));

or

+          ? "xn_var[0] == '0' && xn_var[1] == '\\0'"
+          : "xn_var[0] == '1' && xn_var[1] == '\\0'"));

I am not sure which version is more sensible
- the first one will trigger for invalid input, which ROCm
   ignores and falls back to the default setting
- while the second one will only trigger for valid input
   of the wrong kind.
For an error, I think the second one would be definitely
the choice. For a warning, I think either is fine.

Originally I wanted to use exclusively '1' because undefined values are also effectively '0', but you make a point that the default can be configured differently. That being so, I think I like the != version better because it also warns for HSA_XNACK=nonsense case.

BTW: If one looks at the ROCr source code, one sees:

     // Legal values are zero "0" or one "1". Any other value will
     // be interpreted as not defining the env variable.
     var = os::GetEnvVar("HSA_XNACK");
    xnack_ = (var == "0") ? XNACK_DISABLE : ((var == "1") ? XNACK_ENABLE : XNACK_UNCHANGED);

I don't want to make assumptions that might not remain true. As long as the message is only a warning then I think the check is reasonably future-proof.

The v2 patch is attached.

Any more objections?

Andrew
From 2e8e1e95bfb318468b0f9ed17e8ffd3cb989f595 Mon Sep 17 00:00:00 2001
From: Andrew Stubbs <[email protected]>
Date: Tue, 11 Nov 2025 15:41:04 +0000
Subject: [PATCH v2] amdgcn: Auto-detect USM mode and set HSA_XNACK

The AMD GCN runtime must be set to the correct "XNACK" mode for Unified Shared
Memory and/or self-mapping to work, but this is not always clear at compile and
link time due to the split nature of the offload compilation pipeline.

When XNACK mode is enabled, the runtime will restart GPU load/store
instructions that fail due to memory exceptions caused by page-misses.  While
this is important for shared-memory systems that might experience swapping, we
are mostly interested in it because it is also used to implement page migration
between host and GPU memory, which is the basis of USM.

This patch checks that the XNACK mode is configured appropriately in the
compiler, and mkoffload then adds a runtime check into the final program to
ensure that the HSA_XNACK environment variable passes the correct mode to the
GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

Other toolchains require the end-user to set HSA_XNACK manually (or else wonder
why it's not working), so the constructor also checks that any existing manual
setting is compatible with the binary's requirements.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_init_cumulative_args): Emit a warning if the
	-mxnack setting looks wrong.
	* config/gcn/mkoffload.cc: Include tree.h and omp-general.h.
	(process_asm): Add omp_requires parameter.
	Emit HSA_XNACK code into mkoffload_setup, as required.
	(main): Modify HSACO_ATTR_OFF to preserve user-set -mxnack.
	Pass omp_requires to process_asm.
---
 gcc/config/gcn/gcn.cc       | 10 +++++++
 gcc/config/gcn/mkoffload.cc | 54 +++++++++++++++++++++++++++++++------
 2 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 1e04074d78b..a729ea4de36 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2940,6 +2940,16 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
     gcn_detect_incoming_pointer_arg (fndecl);
 
+  if ((omp_requires_mask & (OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+			    | OMP_REQUIRES_SELF_MAPS))
+      && gcn_devices[gcn_arch].xnack_default != HSACO_ATTR_UNSUPPORTED
+      && flag_xnack == HSACO_ATTR_OFF)
+    {
+      warning_at (UNKNOWN_LOCATION, 0,
+		  "Unified Shared Memory is enabled, but XNACK is disabled");
+      inform (UNKNOWN_LOCATION, "Try -foffload-options=-mxnack=any");
+    }
+
   reinit_regs ();
 }
 
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 222adb2cd41..d9d89c64f95 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -38,6 +38,9 @@
 #include "configargs.h"  /* For configure_default_options.  */
 #include "multilib.h"  /* For multilib_options.  */
 
+#include "tree.h"	 /* Dependency of omp-general.h.  */
+#include "omp-general.h" /* For enum omp_requires.  */
+
 /* These probably won't (all) be in elf.h for a while.  */
 #undef  EM_AMDGPU
 #define EM_AMDGPU		0xe0;
@@ -441,10 +444,12 @@ copy_early_debug_info (const char *infile, const char *outfile)
    encoded as structured data.  */
 
 static void
-process_asm (FILE *in, FILE *out, FILE *cfile)
+process_asm (FILE *in, FILE *out, FILE *cfile, uint32_t omp_requires)
 {
   int fn_count = 0, var_count = 0, ind_fn_count = 0;
   int dims_count = 0, regcount_count = 0;
+  bool xnack_required = (omp_requires & (OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+					 | OMP_REQUIRES_SELF_MAPS));
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -469,6 +474,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
     { IN_CODE,
       IN_METADATA,
@@ -549,7 +555,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-      char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
@@ -615,11 +620,24 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (&regcounts_os, struct regcount *);
 
+  /* If the -mxnack setting has a definite value (not "any" or undefined), or
+     the program "requires unified_shared_memory" (in which case -mxnack might
+     be "any"), then we emit code to check the mode at runtime.  */
+  bool check_xnack = (TEST_XNACK_OFF (elf_flags)
+		      || TEST_XNACK_ON (elf_flags)
+		      || xnack_required);
+  if (TEST_XNACK_OFF (elf_flags) && xnack_required)
+    fatal_error (input_location,
+		 "conflicting settings; XNACK is forced off but Unified "
+		 "Shared Memory is on");
+
+  /* Start generating the C code.  */
   if (gcn_stack_size)
-    {
-      fprintf (cfile, "#include <stdlib.h>\n");
-      fprintf (cfile, "#include <stdbool.h>\n\n");
-    }
+    fprintf (cfile, "#include <stdbool.h>\n");
+  if (check_xnack)
+    fprintf (cfile, "#include <stdio.h>\n");
+  if (gcn_stack_size || check_xnack)
+    fprintf (cfile, "#include <stdlib.h>\n\n");
 
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
@@ -677,6 +695,25 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	     "    setenv (\"GCN_STACK_SIZE\", \"%d\", true);\n",
 	     gcn_stack_size);
 
+  /* Emit a constructor function to set the HSA_XNACK environment variable.
+     This must be done before the ROCr runtime library is loaded.
+     We never override a user value (except empty string), but we do emit a
+     useful diagnostic in the wrong mode (the ROCr message is not good.  */
+  if (check_xnack)
+    fprintf (cfile,
+	     "\n"
+	     "  const char *xn_var = getenv (\"HSA_XNACK\");\n"
+	     "  if (!xn_var || xn_var[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+	     "  else if (%s)\n"
+	     "    fprintf (stderr, \"warning: HSA_XNACK=%%s is incompatible; "
+			   "the GPU kernel may revert to host fallback\\n\", "
+			   "xn_var);\n",
+	     xnack_required || TEST_XNACK_ON (elf_flags),
+	     (xnack_required || TEST_XNACK_ON (elf_flags)
+	      ? "xn_var[0] != '1' || xn_var[1] != '\\0'"
+	      : "xn_var[0] != '0' || xn_var[1] != '\\0'"));
+
   /* End of mkoffload_setup function.  */
   fprintf (cfile, "}\n\n");
 
@@ -1116,7 +1153,8 @@ main (int argc, char **argv)
 #define GCN_DEVICE(name, NAME, ELF, ISA, XNACK, SRAM, ...) \
     case ELF: XNACK; break;
 #define HSACO_ATTR_UNSUPPORTED SET_XNACK_UNSET (elf_flags)
-#define HSACO_ATTR_OFF SET_XNACK_OFF (elf_flags)
+#define HSACO_ATTR_OFF \
+      if (TEST_XNACK_UNSET (elf_flags)) SET_XNACK_OFF (elf_flags)
 #define HSACO_ATTR_ANY \
       if (TEST_XNACK_UNSET (elf_flags)) SET_XNACK_ANY (elf_flags)
 #include "gcn-devices.def"
@@ -1348,7 +1386,7 @@ main (int argc, char **argv)
       if (!out)
 	fatal_error (input_location, "cannot open %qs", gcn_s2_name);
 
-      process_asm (in, out, cfile);
+      process_asm (in, out, cfile, omp_requires);
 
       fclose (in);
       fclose (out);
-- 
2.51.0

Reply via email to