Douglas Eadline wrote:
I get the desire for fault tolerance etc. and I like the idea
of migration. It is just that many HPC people have spent
careers getting applications/middleware as close to the bare
metal as possible. The whole VM concept seems orthogonal to
this goal. I'm curious how people are approaching this
problem.
Like many things, the devil is in the details. While I don't want to be as
prodigious as rgb, I want to mention a few things and ask some questions:
- With multi-core processors, to get the best performance you want to
assign a process to a core. But this can cause problems when moving
a process or creating a checkpoint. For example VMware explicitly
tells you not to do this. While I can't state their position, in
general the
idea is that restarting a check-pointed VM may have problems when
a process is pinned to a core (even more so if the CPU is different).
Also, moving a pinned process to another node may cause problems
if the nodes is different in pretty much any way (it may also be affected
by what's on the new node).
- As Ashley pointed out, the network aspect is still very problematic.
Getting good performance out of a NIC in a VM is not easy and from
what I understand difficult or impossible to do with multi-core nodes
(I would love to hear if someone has gotten very good performance out
of a NIC in a VM when other VM's are also using the same NIC. Please
give as many details as possible)
- As Meng mentioned, IO is still problematic (I think for the same reasons
that interconnects are).
- I haven't seen any benchmarks run in VM's using several nodes with
an interconnect. Does anyone know of any?
- Has anyone tried moving processes around to different nodes for an
MPI job? I'm curious what they found.
I would like to see virtualization take off in HPC, but I have to see a few
demos of things working and I need to see reasons why I should adopt
it. Right not I don't relish taking my "High" Performance Computing
system and turning it into "Kind-of-High" Performance Computing because
it would allow non-code specific checkpointing or movement of processes.
Losing 10% in performance, for example, in HPC is a big deal, and I haven't
yet seen the benefits of virtirualization for giving up the 10% (I'm
dying to
be shown to be wrong though).
The only aspect of virtualization that could make some sense in HPC is
what rgb mentioned - allowing the user to select and OS as part of their
job and installing or tearing down the OS as part of the job. I can see this
being very useful if the details could be worked out (I know there are
people
working on it but I haven't seen any large demonstrations of it yet and I
would really like to see such a beastie).
Anyway, my 2 cents (and probably my last since this topic falls under
Landman's Rule: of flammability).
Jeff
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf