On 05/02/2025 11:14, Tobias Burnus wrote:
The number of AMD GPUs is huge - and, unfortunately, every GPU device
is potentially slightly different, requiring different code generation
either in some dusty corner case or for standard code.
As for several GPUs identical code can run (either all or when disabling
some features), AMD introduced with LLVM 19 some gfx*-generic targets.
GCC added support for gfx10-3-generic and gfx11-generic with commit
r15-4550-g1bdeebe69b71bf in October 2024 (undocumented). GCC itself
always supports all -march= targets, but a assembler supporting the arch
is required such that at user runtime and when building a multilib, a
assembler (and linker) supporting the new features is required. (GCC
uses LLVM' assembler (llvm-mc) and linker (lld), i.e. LLVM 19+ is
required for gfx*-generic.] However, the required runtime code landed in
ROCm much later; namely, commit 0c18ff22 rocr: Generic ISA targets
support (Oct 28, 2024) in https://github.com/ROCm/ROCR-Runtime It is
believed that the next ROCm release contained this feature, which is
ROCm 6.3, released on Dec 3, 2024. The latest ROCm is 6.3.2 of Jan 28,
2025. Still, adding gfx*generic increases the number of required
multilibs as it does not seem to be possible to link mixed code of
generic and specific GPU code. See https://llvm.org/docs/
AMDGPUUsage.html#amdgpu-generic-processor-table for a list of
gfx*generic and supported gfx* devices and some generic restrictions due
to using multilib. While gfx11-generic and gfx10-3-generic include all
GPUs of that generation, with no or few restrictions, GFX9 devices are
rather different and, hence, gfx9-generic only covers a subset of the
devices. * * * This patch now enables support for gfx10-3-generic,
gfx11-generic and (new!) gfx9-generic in libgomp, making it actually
usable. In libgomp, GCC prints its own diagnostic if there is an ISA
mismatch between the actual GPU and the compiled-for GPU. Hence, not
only ROCm but also GCC needs to know which GPUs are compatible - in
order to propose the -foffload-options=-march=gfx... to compile for.
That diagnostic now also proposes to try compile for the specific
gfx*generic besides compiling for the specific GPU. Reasoning: As the
number of multilibs is limited, having only a gfx11-generic multilib, it
makes sense to propose -march=gfx11-generic besides, e.g., -
march=gfx1103 especially when the gfx1103 multilib is unavailable - and
vice versa. In case GCC thinks that the ISA is supported but (a too old)
ROCm does not recognize it, the error is now inferior; however, some
wording has been added to the generic error message, which might still
help. As there are a couple of GPUs, previously unsupported, that are
supported by ROCm with the same gfx*-generic as GPUs we support, it
makes sense to add those GPUs as well - both to handle them in libgomp's
generic diagnostic and to support them in general. Therefore, the
following GPUs are now supported in addition: gfx902, gfx904, gfx909,
gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101, gfx1102, gfx1150,
gfx1151, gfx1152, and gfx1153. However, the multilib config has not been
touched, hence, those 14 device types and gfx{9,10-3,11}-generic are not
supported by default. Currently, the following 9 GPUs are enabled by
default:gfx900, gfx906, gfx908, gfx90a, gfx90c, gfx1030, gfx1036,
gfx1100, andgfx1103.
I'm not too happy about adding a whole list of specific devices that we
have not tested. So far, whenever I have added a new device there have
been meta-data oddities and such-like that needed to be tweaked.
Admittedly, adding a new device to an existing generation has been
easier, but still there have been unexpected issues.
Adding the generic architectures does make sense, assuming we can test
them, and seems like a much better way to support these devices, until
somebody can add properly tested and tuned support for an individual device.
I also don't like adding knowledge of unsupported devices purely for
improving diagnostics. It's fine for the known-unsupported devices, but
wait a month or so and there will be new unknown-unsupported devices,
and the message degrades again. Worse, the new diagnostic can recommend
trying -march=<name> for devices which the compiler will recognize but
have never been tested, and probably don't have multilibs configured.
A better approach might be to pattern-match "gfx{9,10,11}" in the name
HSA gives you for the physical device and recommend generic
-march=gfx{9,10,11}-generic in those cases?
* * *
For distros building with LLVM 19, I could imagine that adding the
gfx10-3-generic and gfx11-generic (and possibly gfx9-generic) multilibs
could make sense; whether gfx1030, gfx1036, gfx1100, andgfx1103 could
already be
dropped - or only later (once ROCm 3.6 is more widely deployed) is the a
good
question. [My gut feeling is that a distro should wait until next year,
given
that December 2024 is still very recent.]
* * *
Thus back to the attached patch, which does:
* Add gfx9-generic - and enable libgomp support for gfx10-3-generic
* Addgfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153. *
Update the install + invoke (-march=) documentation for it
The patch has loosely be tested - but I currently do not have a ROCm 6.3
available with a gfx*-generic supported device; hence, I don't know whether
it really works.
Thus, I would be happy if someone with a supported gfx{9,10-3,11}-generic
device - or a newly added non-generic gfx* could test whether it actually
works!
[I am about to get a ROCm 6.3.2 with a gfx906 device, possibly later also
for gfx900 and even later for gfx1100.]
Any comment, remark, suggestion?
OK for mainline, once someone has shown that any gfx*-generic actually
works?
I'm happy to add the new gfx9-generic, and improving the diagnostics is
always good, but I'm not convinced about making it look like we support
devices we've never tested.
(Of course, if someone is able to test them, then that's different.)
Andrew