Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

Tobias Burnus Wed, 05 Feb 2025 04:51:44 -0800

Hi Andrew,

Andrew Stubbs wrote:

On 05/02/2025 11:14, Tobias Burnus wrote:
Therefore, the following GPUs are now supported in addition: gfx902,gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101,gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153. However, themultilib config has not been touched, hence, those 14 device typesand gfx{9,10-3,11}-generic are not supported by default. Currently,the following 9 GPUs are enabled by default:gfx900, gfx906, gfx908,gfx90a, gfx90c, gfx1030, gfx1036, gfx1100, andgfx1103.
I'm not too happy about adding a whole list of specific devices thatwe have not tested. So far, whenever I have added a new device therehave been meta-data oddities and such-like that needed to be tweaked.

Well, the idea is: If AMD has collected them under the same genericname, the ISA must be compatible. The LLVM page lists some restrictions(such as not having sramecc support when using generic) but none of thelisted items match what we have.

I fail to see how an ISA that works with, e.g., gfx9-generic willsuddenly fail when compiling for it with gfx902, which except for theELF flag contains identical code.

I also don't like adding knowledge of unsupported devices purely forimproving diagnostics.

I think we have the option to delegate the checking purely to ROCm. Thengfx9-generic will run on gfx909 – or we do our own checking. But then weneed to somehow know whether gfx9-generic code will run on gfx909 or not– or we bluntly reject it.

It's fine for the known-unsupported devices, but wait a month or soand there will be new unknown-unsupported devices, and the messagedegrades again. Worse, the new diagnostic can recommend trying-march=<name> for devices which the compiler will recognize but havenever been tested, and probably don't have multilibs configured.

The having-no-multilib-configured issue is difficult to come by, unlesswe want to filter them out when building libgomp. We could do so,however, by doing some preprocessing.


The problem is that we then need to have two checks:

(a) Whether it runs (if we don't relegate it to ROCm) – in that case,gfx902 hardware with gfx9-generic should just work, even if there isneither a gfx902 nor gfx9-generic multilib. After all, the user managedto link the executable.

(b) When recompiling on the same system as running the build, suggestinga -march=gfx... that has a multilib would be better, i.e. here thefiltered-out value could be helpful.

(c) For suggesting generic, we also would need to check the ROCm versionto only propose it when ROCm is > 6.3, assuming that's the thing.

BTW: The issue of having no multilib configured is not really new. Wehad it before with fiji or when the user configured GCC in somenon-default way. (As we currently enable all GPUs by default. But Ithink we didn't do so for a while for the gfx1... ones. But I don'trecall whether we did do so for a release or not.)


And I think the error is also not to illegible:

ld: error: unable to find library -lgfortran

gcn mkoffload: fatal error:.../x86_64-pc-linux-gnu-accel-amdgcn-amdhsa-gcc returned 1 exit status

A better approach might be to pattern-match "gfx{9,10,11}" in the nameHSA gives you for the physical device and recommend generic-march=gfx{9,10,11}-generic in those cases?

I think that will be way worse. — gfx908 and gfx90a are *not* compatiblewith gfx9-generic. Similarly, gfx94{0,1,2}/gfx950 are gfx9 devices butonly in gfx9-4-generic and not supported by us. And for gfx10, we onlysupport gfx10-3-generic, i.e. gfx103x (technically x = 0...6, currentlyonly 0 and 3), but not gfx10-1-generic (gfx101{0,1,2,3}).

Thus, I think it is way better to assume that GPUs listed for eachgfx*-generic as having identical ISA than any other proposed way. Wecould hard code this ourselves (as done in the patch) or to do it byletting ROCm do the job.

(There are some restrictions listed, like "not all VGPR can be used ongfx1100" but as we added gfx1103, we can just use the gfx1103 settingsas gfx1100 does not have those features, either.)


Thus, I still regard my proposed approach as superior.

I'm happy to add the new gfx9-generic, and improving the diagnosticsis always good, but I'm not convinced about making it look like wesupport devices we've never tested.

As mentioned, AMD regards them as compatible. I am happy to add somewording like "(unsupported)" to the -march= documentation, in case it helps.


Tobias

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

Reply via email to