When you launch your clck jobs, do you launch them with slurm, or do you use a nodefile? When I use a nodefile, I get an error that it can't call mpirun on one of the nodes, or something like that. I'd provide the exact error message, but I don't have access to it at the moment.

Prentice

On 4/30/20 11:49 AM, Black, Brady P wrote:
Hi - Intel Cluster Checker person chiming in.

To answer your question Prentice about runtime of Cluster Checker (CLCK), this 
will depend on which set of tests or framework definition (FWD) you use and the 
number of servers. The default fwd, is health_base which should run in a matter 
of seconds. It was designed to run quickly and be a sanity check before running 
jobs. Other FWDs are designed for cluster hand-off and validation; so these 
will take much longer as they run a multitude of different benchmarks on 
individual nodes (stream/dgemm/sgemm/...) and across the cluster 
(hpcg/hpl/pairwise imb/...) looking for outliers. Which can take 90+ minutes to 
multiple hours depending on the system configuration and size. Of course there 
are inbetween tests also such as health_extended_user or mpi_prereq_user.

Couple of tips - clck -X list is a great way to see what framework definitions exist. 
clck -X <name_of_fwd> will give you more details on what is being checked for 
the specific fwd.

Thanks for using cluster checker and providing feedback. Happy to help further.

-Brady

-----Original Message-----
From: Beowulf <beowulf-boun...@beowulf.org> On Behalf Of Michael Di
Domenico
Sent: Thursday, April 30, 2020 10:23
Cc: Beowulf Mailing List <beowulf@beowulf.org>
Subject: Re: [Beowulf] Intel Cluster Checker

i played with it about a year ago since i get it as part of the intel compiler
bundle we pay for.  it was overly complicated to install and run and didn't
seem worth while.  kind of like getting a piece of ikea furniture but then
trying to use a phillips screw driver to build it instead of the little wrench.
otherwise when i dug into what it was actually doing, it didn't seem to be
doing anything magical.  it was just doing it 'the intel way', which in my
experience is generally very strange



On Wed, Apr 29, 2020 at 4:07 PM Prentice Bisbal via Beowulf
<beowulf@beowulf.org> wrote:
Beowulfers,

Have any of you used the Intel Cluster Checker? I've been tasked with
using it, and I think I have it running, but the documentation isn't
very good. I was wondering how long a typical run on some cluster
nodes should take.

Prentice

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing To change your subscription (digest mode or unsubscribe)
visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to