On 2017-02-14 03:00, Douglas Eadline wrote: > >> Hi guys, >> >> So, we're running a small(as in a small number of nodes(10), not >> storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum >> Scale(GPFS) which works fine and has POSIX support. On top of GPFS we >> have a GPFS transparency connector so that HDFS uses GPFS. >> >> Now, if I'd like to replace GPFS with something else, what should I use? >> It needs to be a fault-tolerant DFS, with POSIX support(so that users >> can move data to and from it with standard tools). > > HDFS does have a NFSv3 gateway which helps users move > data around in a familiar fashion (without the -put -get commands). > If you need to use HDFS for big block local streaming performance > that feature can be useful. If you are doing Spark or MR where data > locality is important, then HDFS is a low cost alternative > to other file systems. Plus if you use something like > Ambari/Hortonworks the management is somewhat integrated > in the web-GUI. (Hortonworks is open source rpm based) > If you don't care about locality, then another file system > will work. > > As an aside, having done a handful of Hadoop/Spark workshops > in the last year, I have found the single most difficult > aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding > the "remote" or non-local aspect of HDFS, i.e. the fact that > a copy of the data must be loaded into HDFS before it > can be used. The NFS gateway helps because files can > be seen in a users local file system. But I digress ... > > -- > Doug > >> >> I've looked at MooseFS which seems to be able to do the trick, but are >> there any others that might do? >> >> TIA >> >> -- >> Best regards, >> >> Tony Albers >> Systems administrator, IT-development >> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark. >> Tel: +45 2566 2383 / +45 8946 2316 >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> Mailscanner: Clean >> > >
Some very good points there. No doubt the NFS gateway can be useful. But, NFS gateway in itself is not enough for our purposes. -- Best regards, Tony Albers Systems administrator, IT-development Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark. Tel: +45 2566 2383 / +45 8946 2316 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf