Hi Andrew,
Andrew Stubbs:
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.
Namely, for GPU code that is compiled with either xnack=on or with
xnack=any and additionally requires USM/self_maps, it makes sense to
have HSA_XNACK=1 set; the first one for compatibility, the second one to
actually make use of USM.
(There are other GPUs that do not support xnack=…, some of them are APUs
and still support USM, others aren't and also don't support USM.)
Other toolchains require the end-user to set HSA_XNACK manually (or else wonder
why it's not working),
Some other tool chains also permit compiling a "fat binary" with
multiple variants compiled in, permitting e.g. mixing a xnack+ and a
xnack- binary. – In that case, the approach as taken here won't work in
general.
For a non-fat binary, having the HSA_XNACK set by the compiler/runtime
still seems to make sense.
* * *
so the constructor also checks that any existing manual
setting is compatible with the binary's requirements.
Can you also update:
https://gcc.gnu.org/onlinedocs/libgomp/AMD-Radeon.html – search for
HSA_XNACK.
That's only for the USM case, but I think that's the only one that
really matters. (Even if xnack+ / on is also affected.)
* * *
+ if ((omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
Can you also check for OMP_REQUIRES_SELF_MAPS? Namely:
omp_requires_mask & (OMP_REQUIRES_UNIFIED_SHARED_MEMORY
| OMP_REQUIRES_SELF_MAPS)
+ if (flag_xnack == HSACO_ATTR_ON
+ || (omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+ && gcn_devices[gcn_arch].xnack_default != HSACO_ATTR_UNSUPPORTED))
Likewise, you probably want to add the self mapping flag here as well.
+ /* If a device supports XNACK then it will be needed for USM. */
+ fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: XNACK+\n");
}
BTW: I am not saying that it is better, but all the information
should be already available in mkoffload itself.
When debugging data is produced, mkoffload already needs to produce
an elf file with proper eflags - hence, we know which XNACK is used
andomp_requires is available to the compiler itself. Hence, alls
TEST_XNACK_UNSET etc. flags are available. * * *
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 222adb2cd41..c9bf13c2da5 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
...
+ if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: XNACK+%c", &dummy) > 0)
+ xnack_required = true;
+
...
+ bool use_xnack = (!TEST_XNACK_ANY (elf_flags) || xnack_required);
If you already use TEST_XNACK_ANY, why don't you pass the existing
'omp_requires' variable as additional argument to 'process_asm'
and just use it?
It seems as if the only information you currently use from the
device side is the requires part – as the XNACK part also is
available on the host side.
* * *
I had expected somethink like:
use_xnack = ((!TEST_XNACK_ANY () && !TEST_XNACK_UNSET())
|| omp_requires & (self|USM));
here, note not only the use of the omp_requires but also
the exclusion of 'unsupported'. (I think unset is effectively
unset as that has 0x000 while off/any/on have 0x100, 0x200, 0x300,
respectively.)
* * *
+ if (TEST_XNACK_OFF (elf_flags) && xnack_required)
+ fatal_error (input_location,
+ "conflicting settings; XNACK is forced off but Unified "
+ if (!TEST_XNACK_ANY (elf_flags) || xnack_required)
Why do you define use_xnack above if you replicate the condition here?
And: I think there is no point to check the HSA_XNACK value or to set
if a GPUs that don't support XNACK ...
+ fprintf (cfile,
+ "\n"
+ " const char *xn_var = getenv (\"HSA_XNACK\");\n"
+ " if (!xn_var || xn_var[0] == '\\0')\n"
+ " setenv (\"HSA_XNACK\", \"%d\", true);\n"
+ " else if (%s)\n"
+ " fprintf (stderr, \"warning: HSA_XNACK=%%s is incompatible; "
+ "the GPU kernel may revert to host fall-back\\n\", "
+ "xn_var);\n",
+ xnack_required || TEST_XNACK_ON (elf_flags),
OK, it seems as if you want to force set HSA_XNACK=0 for xnack-,
based on the idea that the default can be overridden at boot time
such that no HSA_XNACK not always means '0' but also can mean '1'.
I think this deserves a comment.
+ (xnack_required || TEST_XNACK_ON (elf_flags)
+ ? "xn_var[0] != '1' || xn_var[1] != '\\0'"
+ : "xn_var[0] == '1' && xn_var[1] == '\\0'"));
I think it make sense to make it symmetric. Namely:
+ ? "xn_var[0] != '1' || xn_var[1] != '\\0'"
+ : "xn_var[0] != '0' || xn_var[1] != '\\0'"));
or
+ ? "xn_var[0] == '0' && xn_var[1] == '\\0'"
+ : "xn_var[0] == '1' && xn_var[1] == '\\0'"));
I am not sure which version is more sensible
- the first one will trigger for invalid input, which ROCm
ignores and falls back to the default setting
- while the second one will only trigger for valid input
of the wrong kind.
For an error, I think the second one would be definitely
the choice. For a warning, I think either is fine.
BTW: If one looks at the ROCr source code, one sees:
// Legal values are zero "0" or one "1". Any other value will
// be interpreted as not defining the env variable.
var = os::GetEnvVar("HSA_XNACK");
xnack_ = (var == "0") ? XNACK_DISABLE : ((var == "1") ? XNACK_ENABLE :
XNACK_UNCHANGED);
* * *
Tobias