Hi Ward,

On 11/5/23 21:32, Ward Poelmans wrote:
Yes, it's very similar. I've put our systemd unit file also online on https://gist.github.com/wpoely86/cf88e8e41ee885677082a7b08e12ae11

This looks really good! However, I was testing the waitforib.sh script on a SuperMicro server WITHOUT Infiniband and only a dual-port Ethernet NIC (Intel Corporation Ethernet Connection X722 for 10GBASE-T).

The EL8 drivers in kernel 4.18.0-477.27.2.el8_8.x86_64 seem to think that the Ethernet ports are also Infiniband ports:

# ls -l /sys/class/infiniband
total 0
lrwxrwxrwx 1 root root 0 Nov 10 14:31 irdma0 -> ../../devices/pci0000:5d/0000:5d:02.0/0000:5e:00.0/0000:5f:03.0/0000:60:00.0/infiniband/irdma0 lrwxrwxrwx 1 root root 0 Nov 10 14:31 irdma1 -> ../../devices/pci0000:5d/0000:5d:02.0/0000:5e:00.0/0000:5f:03.0/0000:60:00.1/infiniband/irdma1

This might disturb the logic in waitforib.sh, or at least cause some confusion?

One advantage of Max's script using NetworkManager is that nmcli isn't fooled by the fake irdma Infiniband device:

# nmcli connection show
NAME  UUID                                  TYPE      DEVICE
eno1  cb0937f8-1902-48f7-8139-37cf0c4077b2  ethernet  eno1
eno2  98130354-9215-412e-ab26-032c76c2dbe4  ethernet  --

I found a discussion of the mysterious irdma device in
https://github.com/prometheus/node_exporter/issues/2769
with this explanation:

The irdma module is Intel's replacement for the legacy i40iw module, which was the 
iWARP driver for the Intel X722. The irdma module is a complete rewrite, which 
landed in mainline kernel 5.14, and which also now supports the Intel E810 (iWARP 
& RoCE).

The Infiniband commands also work on the fake device, claiming that it runs 100 Gbit/s:

# ibstatus
Infiniband device 'irdma0' port 1 status:
        default gid:     3cec:ef38:d960:0000:0000:0000:0000:0000
        base lid:        0x1
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet

Infiniband device 'irdma1' port 1 status:
        default gid:     3cec:ef38:d961:0000:0000:0000:0000:0000
        base lid:        0x1
        sm lid:          0x0
        state:           1: DOWN
        phys state:      3: Disabled
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet

IMHO, this seems quite confusing.

Regarding the slurmd service:

And we add it as a dependency for slurmd:

$ cat /etc/systemd/system/slurmd.service.d/wait.conf

[Service]
Environment="CUDA_DEVICE_ORDER=PCI_BUS_ID"
LimitMEMLOCK=infinity

[Unit]
After=waitforib.service
Requires=munge.service
Wants=waitforib.service

An alternative to this extra service would be like Max's service file https://github.com/maxlxl/network.target_wait-for-interfaces/blob/main/wait-for-interfaces.service which has:
Before=network-online.target

What do you think of these considerations?

Best regards,
Ole

On 2/11/2023 09:28, Ole Holm Nielsen wrote:
Hi Ward,

Thanks a lot for the feedback!  The method of probing /sys/class/infiniband/*/ports/*/state is also used in the NHC script lbnl_hw.nhc and has the advantage of not depending on the nmcli command from the NetworkManager package.

Can I ask you how you implement your script as a service in the Systemd booting process, perhaps similar to Max's solution in https://github.com/maxlxl/network.target_wait-for-interfaces ?

Thanks,
Ole

On 11/1/23 20:09, Ward Poelmans wrote:
We have a slightly difference script to do the same. It only relies on /sys:

# Search for infiniband devices and check waits until
# at least one reports that it is ACTIVE

if [[ ! -d /sys/class/infiniband ]]
then
     logger "No infiniband found"
     exit 0
fi

ports=$(ls /sys/class/infiniband/*/ports/*/state)

for (( count = 0; count < 300; count++ ))
do
     for port in ${ports}; do
         if grep -qc ACTIVE $port; then
             logger "Infiniband online at $port"
             exit 0
         fi
     done
     sleep 1
done

logger "Failed to find an active infiniband interface"
exit 1



Reply via email to