Hi,

We've been using the sreport cluster utilization report to report on Down time 
and therefore produce an uptime figure for the entire cluster. Which we hope 
will be above 99% or very close to, for every month of the year.

Most of the time the figure that comes back is one that fits the perception of 
the day to day running of the cluster.

We don't log node UP/DOWN in any way (beyond what slurm does) and rely on 
sreport as explained above.

The December figure we have is lower than 99% and there are 438 slurm nodes in 
the cluster. In December we only remember having problems with 3 nodes. So at 
the moment off the top of the head we don't understand this reported Down time.

Is anyone else relying on sreport for this metric? If so have you encountered 
this sort of situation?

regards
David


-------------
David Simpson - Senior Systems Engineer
ARCCA, Redwood Building,
King Edward VII Avenue,
Cardiff, CF10 3NB

David Simpson - peiriannydd uwch systemau
ARCCA, Adeilad Redwood,
King Edward VII Avenue,
Caerdydd, CF10 3NB

simpso...@cardiff.ac.uk<mailto:simpso...@cardiff.ac.uk>
+44 29208 74657

COVID-19 Cardiff University is currently under remote work restrictions. Our 
staff are continuing normal work schedules, but responses may be slower than 
usual.  We appreciate your patience during this unprecedented time

COVID-19 Ar hyn o bryd mae Prifysgol Caerdydd o dan gyfyngiadau gweithio o 
bell.  Mae ein staff yn parhau ag amserlenni gwaith arferol, ond gall ymatebion 
fod yn arafach na'r arfer. Rydym yn gwerthfawrogi eich amynedd yn ystod yr 
amser digynsail hwn.

Reply via email to