On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote: > We've a user who has requested its installation on one of our clusters, > a high-throughput system.
You didn't say anything about what they wanted to do. Hadoop is designed to store a lot of data, and then enable what we HPC people would call nearly-embarrassingly-parallel computation with good locality -- it takes shards of mapreduce computation to run on the same system as the disk shards being processed. This means you'll have to dedicate systems over the long term to store the data (much like PVFS), and all of these systems will have to be a part of their mapreduce jobs. So if your queue system can run whole-cluster jobs easily, no problem. If, instead, they're just looking for a simple way to do embarrassingly parallel computations, without lots of persistent data, then you can probably point them at something easier and more friendly to your queue system. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf