pps. I guess I could clear the errors every time this runs, but have
decided to just do an initial clear of the errors and look at the
cumulative rate.
ppps. there is a better list for this chatter, isn't there...
On 19 June 2014 15:10, John Hearns wrote:
> If anyone is interested, here is my
If anyone is interested, here is my solution, which seems good enough.
Someone will no doubt say there is a neater way!
A shell script which runs ibqueryerrors and returns 1 if anything is found:
#!/bin/bash
# check for errors on the Infiniband fabric 0
# another script runs for port 1
errors=`/
Does anyone have good tips on moniroting a cluster for Infiniband errors?
Specifically Mellanox/OpenFabrics on an SGI cluster.
I am thinking of running ibcheckerrors or ibqueryerrors and parsing the
output.
I have Monit set up on the cluster head node
http://mmonit.com/monit/
which I find quite