On 11/27/2012 08:59 AM, Eugen Leitl wrote: > On Tue, Nov 27, 2012 at 09:10:32AM +0100, Jonathan Aquilina wrote: >> Hey guys I was looking at the hadoop page and it got me wondering. is it >> possible to cluster together storage servers? If so how efficient would a >> cluster of them be? > > An interesting problem would be to use reasonably powerful but > cheap ARM SoCs in few GBytes onboard RAM and some flash > for hybrid filesystems for each hard drive, and cluster > them via GBit Ethernet on a very large scale. > > That would be a custom Beowulf for more storage-related > tasks. E.g. an application I have in mind are volumetric > datasets with e.g. 8 nm - voxels for biological systems, > which are way too large to process in memory.
Are these problems EP such that they could be entirely Map tasks? Because otherwise you are going to have a fairly significant shuffle stage in your MapReduce application that will lead to overheads moving the data over the network and in and out of memory/disk/etc. Shuffling can be a real PITA, but it tends to be present in most real-world applications I've run into. Maybe you weren't referring to using Hadoop, in which case this basically looks just like the FAWN project I had mentioned in the past that came out of CMU (with the addition of tiered storage). Best, ellis _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf