Greg Lindahl wrote:
On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote:

We've a user who has requested its installation on one of our clusters, a high-throughput system.

You didn't say anything about what they wanted to do. Hadoop is
designed to store a lot of data, and then enable what we HPC people
would call nearly-embarrassingly-parallel computation with good
locality -- it takes shards of mapreduce computation to run on the
same system as the disk shards being processed.

Ah, but there's the problem. We've divined what they intend... we think... but they didn't originally tell us.

The PI involved is a relatively new, but reasonably experienced CS prof associated with our bioinformatics crowd. She and her students intend to sift through plant genomic data for patterns (we think, based on her known affiliations). *I* suspect she's interested, as well, because she read about Hadoop and wants to play.

This means you'll have to dedicate systems over the long term to store
the data (much like PVFS), and all of these systems will have to be a
part of their mapreduce jobs. So if your queue system can run
whole-cluster jobs easily, no problem.

Can it? Yes. Is that the intent of the cluster? No. The cluster is configured as a high-throughput system with a gigabit non-blocking backplane. 8 cores/node, all jobs are scheduled on a per-node basis. Each node DOES have local disk (this isn't an opportunity to reopen THAT religious war) so we theoretically could use the Hadoop file system, save it'd likely break our cluster design. Instead, we're looking at Hadoop On Demand (http://hadoop.apache.org/core/docs/r0.17.2/hod.html).

If, instead, they're just looking for a simple way to do
embarrassingly parallel computations, without lots of persistent data,
then you can probably point them at something easier and more friendly
to your queue system.

Yeah, and I've been trying, but someone else promised them it'd be made available without talking to the guys who have to install and support it, because it "looks" like valuable computer science.

gerry
--
Gerry Creager -- gerry.crea...@tamu.edu
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to