Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Gerry Creager Thu, 14 Aug 2008 22:19:25 -0700

Alan Louis Scheinine wrote:

This thread has moved to the question of utilization,
discussed by Mark Hahn, Gus Correa and Håkon Bugge.
In my previous job most people developed code, though test runs
could run for days and use as many as 64 cores.  It was
convenient for most people to have immediate access due to
the excess computation capacity whereas some people in top
management wanted maximum utilization.


I was at a parallel computing workshop where other people
described the contrast between their needs and the goals of
their computer centers.  The computer centers wanted maximum
utilization whereas the spare capacity of the various clusters
in the labs were especially useful for the researchers.  They
could bring to bear the computational power of their informally
administered clusters for special tasks such as when a huge
block of data needed to be analyzed in nearly realtime to see
if an experiment of limited duration was going well.

When most work involves code development, waiting for jobs in
a batch queue means that the human resources are not being
used efficiently.  Of course, maximum utilization of computer
resources is necessary for production code, I just want to
emphasize the wide range of needs.

I would like to add that maximum utilization and fast turn-
around are contradictory goals, it would seem to me based
on the following reasoning.  Consider packing a truck with
boxes where the heigth of the boxes represents the number
of cores and the width of the boxes represents the time of
execution (leaving aside third spatial dimension).  To most
efficiently solve the packing problem we would like to have
all boxes visible on the loading dock before we start packing.
On the other hand, if boxes arrive a few at a time and we must
put the boxes into the truck as they arrive (low queue wait time)
then the packing will not be efficient.  Moreover, as a very
rough estimate, the size of the box defines the scale of the
problem, specifically, if the average running time is 4 hours,
then to have efficient "packing" the time spent waiting in a
queue must on the order of at least 4 and more likely 8 hours
in order to have enough requests visible to be able to find
an efficient solution to the scheduling problem.

An interesting analogy, and further, the thread has been interesting.However, it doesn't even begin to really address near-realtimeprocessing requirements. Examples of these are common in the weathermodeling I'm engaged in. In some cases, looking at severe weather andpredictive models, a model needs to initiate shortly after a watch orwarning is issued, something that's controlled by humans and is notscheduled, hence somewhat difficult to model for job scheduling. Thesemodels would likely be re-run with new data assimilated into theforcings, and a new solution produced. Similarly, models of toxicrelease plumes are unscheduled events with a high priority and lowqueue-wait time requirement.

Other weather models are more predictable but have fairly hardrequirements for when output must be available.

Conventional batch scheduling handles these conditions pretty poorly. Afull queue with even reasonable matching of available cores to requestisn't likely to get these jobs out very quickly on a loaded system.Preemption is the easy answer but unpopular with administrators who haveto answer the phone, users whose jobs are preempted (some never to seetheir jobs return), and the guy who's the preemptor... who gets blamedfor all the problems. Worse, arbitrary preemption assignment meanssomeone made a value judgment that someone's science is more importantthan someone else's, a sure plan for troubles when the parties allgather somewhere... like a faculty meeting.

OK, so I've laid out a piece of the problem. I've got some ideas onsolutions, and avenues for investigation to address these but I'd liketo see others ideas. I don't want to influence the outcome any mroethan I already have.

Oh, and, yeah, I'm aware of SPRUCE but I see a few potential problemsthere, although that framework has some potential.


gc
--
Gerry Creager -- [EMAIL PROTECTED]
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Reply via email to