Geoff Jacobs wrote:
Kyle Spaans wrote:
Wait, we can use openMOSIX and MPI at the same time? I thought they
were separate ideas? For example, MPI for multithreading and message
passing, and openMOSIX for just process migration. Can they be used at
the same time?


1) Spawn MPI processes on the head node. Must be using tcp/ip for
interconnect.
2) OpenMOSIX migrates processes to dormant computers.
3) Sit back and watch the data roll in.

Hello, Geoff.

Sounds nice doesn't it, but it doesn't scale up very well with large numbers of MPI processors...

I run a small (64-node) openMosix (oM) Beowulf cluster running my own port of the Linux-2.4.26-om1 kernel compiled under Ubuntu 6.06.1 LTS (Ubuntu server edition supported until 2011 for any Ubuntu/Debian detractors reading this):

        http://www.ubuntu.com/getubuntu/download

We've tried different MPI implementations including MPICH, and LAM. The problem is that, by default, the same nodes get hammered all the time when MPI programs are run so these nodes have high system times because there is a significant overhead involved when processes are migrated to and from the 'home' nodes (the node where a process is started) to do i/o. The reason is that oM has to use the kernel on the 'home' node to do physical i/o. In fact, I run a patched version that prevents oM migrating a process back home just to run the time() functions. This makes a BIG difference to programs that monitor their own progress!

However, if you use a large number of MPI processors to run a job, you would be better off using "Ganglia" to generate a load-balanced list of MPI nodes first:

        gstat -am

However, for small MPI jobs it's true that openMosix does automatically add load-balancing to MPI. If your jobs are CPU intensive this works very well, but if they do a lot of i/o the cluster spends a lot of its time moving the active pages of processes between the kernels running on different CPU's. BTW, openMosix migrates the active pages of the user context, not the entire process. This is done very efficiently, but has a finite cost because the COTS cluster interconnect is GigaBit ethernet.

The strategy we're developing now is to use all three approaches so that MPI jobs are started on a "gstat" load-balanced list of processors and oM will migrate MPI processes between nodes as the load changes if the oM load-balancing algorithm sees a benefit. However, openMosix is best for 'embarassingly' parallel tasks, which is the main reason we run it.

Re: the FC vs. ubuntu/Debian comments here recently, I'm one of those people who ran RH7.3->RH9 for years longer than I should have because it was both 'stable' and 'supported'. If it had not been for the excellent FedoraLegacy project I would have migrated to Debian years ago. However, the "end of life" statement for RH9 at FL is what forced me into action. It was, indeed, an effort to upgrade to Ubuntu but I'm glad that I did.

        Tony.
--
Dr. A.J.Travis,                     |  mailto:[EMAIL PROTECTED]
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to