Re: [Beowulf] Hadoop

Gerry Creager Sat, 27 Dec 2008 08:10:53 -0800

Jeff,

I'm an old, guy and don't mind top-posts!


Thanks for the insight!
gerry

Jeff Layton wrote:

Sorry for top-posting (I hate these on-line email tools...)
Did the person requesting Hadoop ever say why they wanted it? Forexample, do they have code written in MapReduce or do they think thatHadoop will give them faster throughput than something else?
Hadoop is a project that really has 2 parts to it - an open-sourceMapReduce implementation, and a file system. From people I've talked to,the MapReduce part is used far more than the file system. But I'vetalked to some of the developers of the file system and there are somepeople who use the file system.
In general the file system is basically a virtual file system ala' PVFS,GlusterFS or any object based storage (Panasas, Lustre). However itunderstand the idea of locality - that is where useful storage is inrelation to the compute part of the problem. The idea being that you canreduce the time to transmit the data because the storage is closer. But,in general, the improvement you get is due to the network topology, notnecessarily the file system itself. That's because, in general,MapReduce systems have network topologies with bottlenecks all over theplace because they don't really need a full bi-sectional bandwidthnetwork everywhere. So for example they may have good bandwidth to aswitch within the rack, but outside the rack, they bandwidth is not sohot. But again, these are generalizations, and the details are always inthe implementation.
HadoopFS (lack of a better phrase on my part) is really designed forMapReduce codes - transactional codes. So if the person's code(s) fitthis model, then it might be an interesting experiment to try.Otherwise, there are much better file systems for HPC :)
BTW - I saw Karen's post about using Java with HadoopFS. Be sure to payattention to that since getting a good 64-bit Java implementation forLinux is not always easy. There are a few out there (Sun has an earlyaccess program to a 64-bit Java) but the reports I've heard are thatit's still early.
Hope this helps.

Jeff


------------------------------------------------------------------------
*From:* Gerry Creager <gerry.crea...@tamu.edu>
*To:* Beowulf Mailing List <beowulf@beowulf.org>
*Sent:* Friday, December 26, 2008 6:16:04 PM
*Subject:* [Beowulf] Hadoop

The subject line says it all: Hadoop:  Anyone got any experience with it
on clusters (OK, so Google does, but that really wasn't the question,
was it?).

We've a user who has requested its installation on one of our clusters,
a high-throughput system.  I'm a bit concerned that it's not gonna be
real compatible with, say, Torque/Maui and Gluster, unless we were to
install Xen across the whole cluster and instantiate it within Xen VMs.

However, before I push all MY fears out into the discussion I'd prefer
to see if anyone else has experience and can shed light on compatibility.

Thanks, Gerry
--
Gerry Creager -- gerry.crea...@tamu.edu <mailto:gerry.crea...@tamu.edu>
Texas Mesonet -- AATLT, Texas A&M UniversityCell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org <mailto:Beowulf@beowulf.org>
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


--
Gerry Creager -- gerry.crea...@tamu.edu
Texas Mesonet -- AATLT, Texas A&M University        
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Hadoop

Reply via email to