Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Mark Hahn Thu, 14 Aug 2008 22:43:58 -0700

Gus' numbers makes sense to me. I assume his workload consists of multiplesized jobs, serial, modest parallel, and parallel jobs using all resources.Without pre-emptive scheduling, the batch queue system has to starve thesystem in order to run the larger jobs.


unless backfill can utilize those temporarily idle cpus.

Obviously, before a job whichconsumes all resources starts , then all resources have to be idle. Whichmeans no jobs can't be scheduled, even though they're idle.

true enough, but does depend on the size of large, high-prio jobsrelative to the size of the cluster.

Another interesting metric is of course how many of the jobs runs tosuccessful completion, i.e., are not killed due to resource limits, orcrashes, or for other reasons. That's what I call net vs. gross utilization.


surely this survival rate is quite high, no?  again, it depends largely
on the design of the cluster (I see few node crashes, maybe 1 of 768 nodes
per week, and few resource crashes (perhaps a couple buggy jobs per week))
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Reply via email to