Dear Beowulfers, If your clusters use Infiniband, you know there are only two types of switches: managed or unmanaged. The former come with SSH, a web interface, SNMP and everything ; the latter come with LEDs.
The only (and officially recommended) way to monitor unmanaged switches is to go take a physical look at their PSU and fan LEDs from time to time. Which is obviously not ideal for remote administration, monitoring or getting an alert when something's wrong. To solve that problem, we made a little shell script that does just that: get inventory data, status info, and metrics like fan speeds, temperatures or power usage from unmanaged Infiniband switches: https://github.com/stanford-rc/ibswinfo It took a little reverse-engineering and a good amount of guessing, but it seems to work, it fits the need, and well... it's free. So we're happy to share it with everyone, in case it could be useful to someone else. Cheers, -- Kilian _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf