Source: rocm-smi-lib
Version: 6.1.2-1

When compiling hwloc-contrib on a VM without access to an AMD card, it
runs into an endless loop (well, I suppose it would end eventually)
trying to read files in /sys/class/drm/card<num>/device/vendor which
simply don't exist.

>From an strace:

newfstatat(AT_FDCWD, "/sys/class/drm/card211491991/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491992/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491993/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491994/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491995/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491996/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491997/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491998/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211491999/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211492000/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/class/drm/card211492001/device/vendor",
0x7fff71747d70, 0) = -1 ENOENT (No such file or directory)

This failure mode apparently doesn't exist at rocm-smi-lib in github
anymore (the relevant code
https://github.com/ROCm/rocm_smi_lib/commit/8c444164103bec701ff24c231eddc0eb36fdbef6
(Note: I can't identify which commit changed the code by removing or
at least substantially changing the offending loop).

It appears that the enumeration is called by the init function which
is used by hwloc-contrib to determine if rocm-smi works.

Kind regards,
Sven

Reply via email to