Greg Lindahl wrote:
On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote:
We've a user who has requested its installation on one of our clusters,
a high-throughput system.
You didn't say anything about what they wanted to do. Hadoop is
designed to store a lot of data, and then enable what we HPC people
would call nearly-embarrassingly-parallel computation with good
locality -- it takes shards of mapreduce computation to run on the
same system as the disk shards being processed.
Ah, but there's the problem. We've divined what they intend... we
think... but they didn't originally tell us.
The PI involved is a relatively new, but reasonably experienced CS prof
associated with our bioinformatics crowd. She and her students intend
to sift through plant genomic data for patterns (we think, based on her
known affiliations). *I* suspect she's interested, as well, because she
read about Hadoop and wants to play.
This means you'll have to dedicate systems over the long term to store
the data (much like PVFS), and all of these systems will have to be a
part of their mapreduce jobs. So if your queue system can run
whole-cluster jobs easily, no problem.
Can it? Yes. Is that the intent of the cluster? No. The cluster is
configured as a high-throughput system with a gigabit non-blocking
backplane. 8 cores/node, all jobs are scheduled on a per-node basis.
Each node DOES have local disk (this isn't an opportunity to reopen THAT
religious war) so we theoretically could use the Hadoop file system,
save it'd likely break our cluster design. Instead, we're looking at
Hadoop On Demand (http://hadoop.apache.org/core/docs/r0.17.2/hod.html).
If, instead, they're just looking for a simple way to do
embarrassingly parallel computations, without lots of persistent data,
then you can probably point them at something easier and more friendly
to your queue system.
Yeah, and I've been trying, but someone else promised them it'd be made
available without talking to the guys who have to install and support
it, because it "looks" like valuable computer science.
gerry
--
Gerry Creager -- gerry.crea...@tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf