On 07/21/11 18:07, Greg Lindahl wrote: > On Thu, Jul 21, 2011 at 02:55:30PM -0400, Ellis H. Wilson III wrote: >> My personal experience with getting large amounts of data from local >> storage to HDFS has been suboptimal compared to something more raw, > > If you're writing 3 copies of everything on 3 different nodes, then > sure, it's a lot slower than writing 1 copy. The benefit you get from > this extra up-front expense is resilience.
Used in a backup solution, triplication won't get you much more resilience than RAID6 but will pay a much greater performance penalty to simply get your backup or checkpoint completed. Additionally, unless you have a ton of these boxes you won't get some of the important benefits of Hadoop such as rack-aware replication placement. Perhaps you could alter HDFS to handle triplication in the background once you get the local copy on-disk, but this isn't really what it was built for so again one is probably better off going with a more efficient, if less complex distributed file system. ellis _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf