On 07/21/11 18:07, Greg Lindahl wrote:
> On Thu, Jul 21, 2011 at 02:55:30PM -0400, Ellis H. Wilson III wrote:
>> My personal experience with getting large amounts of data from local
>> storage to HDFS has been suboptimal compared to something more raw,
> 
> If you're writing 3 copies of everything on 3 different nodes, then
> sure, it's a lot slower than writing 1 copy. The benefit you get from
> this extra up-front expense is resilience.

Used in a backup solution, triplication won't get you much more
resilience than RAID6 but will pay a much greater performance penalty to
simply get your backup or checkpoint completed.  Additionally, unless
you have a ton of these boxes you won't get some of the important
benefits of Hadoop such as rack-aware replication placement.  Perhaps
you could alter HDFS to handle triplication in the background once you
get the local copy on-disk, but this isn't really what it was built for
so again one is probably better off going with a more efficient, if less
complex distributed file system.

ellis
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to