If I can help, I'm inside IBM. I'm the marketing lead for IBM Spectrum Scale (aka GPFS), but I have solid connections to the field tech support and development teams.
my corporate email is dougla...@us.ibm.com IBM just announced that HortonWorks will be supported on IBM Spectrum Scale. IBM has a lot of development focus on the Hadoop/Spark use case. On Tue, Feb 14, 2017 at 12:00 PM, Jeffrey Layton <layto...@gmail.com> wrote: > Of course there are tons of options depending upon what you want and your > IO patterns of the applications. > > Doug's comments about HDFS are great - he's a very good expert in this > area. > > Depending upon your IO patterns and workload, NFS may work well. I've > found it work quite well unless you have a bunch of clients really > hammering it. There are some tuning options you can use to improve this > behavior (i.e. more clients beating on it before it collapses). It's good > to have lots of memory in the NFS server. Google for "Dell, NSS" and you > should find some documents on tuning options that Dell created that work > VERY well. > > Another option for NFS is to consider using async mounts. This can > definitely increase performance but you just have to be aware of the > downside - if the server goes down, you could lose data from the clients > (data in flight). But I've seen some massive performance gains when using > async mounts. > > BTW - if you have IB, consider using NFS with IPoIB. This can boost > performance as well. The recent kernels have RDMA capability for NFS. > > If you need encryption over the wire, then consider sshfs. It uses FUSE so > you can mount directories from any host you have SSH access (be sure to NOT > use password-less SSH :) ). There are some pretty good tuning options for > it as well. > > For distributed file systems there are some good options: Lustre, BeeGFS, > OrangeFS, Ceph, Gluster, Moose, OCFS2, etc. (my apologies to any > open-source file systems that I've forgotten). I personally like all of > them :) I've used Lustre, BeeGFS, and OrangeFS in current and past lives. > I've found BeeGFS to be very easy to configure. The performance seems to be > on par with Lustre for the limited testing I did but it's always best to > test your own applications (that's true for any file system or storage > solution). > > There are also commercial solutions that should not be ignored if you want > to go that route. There are bunch of them out there - GPFS, Panasas, > Scality, and others. > > I hope some of these pointers help. > > Jeff > > > On Tue, Feb 14, 2017 at 5:47 AM, John Hanks <griz...@gmail.com> wrote: > >> Should have included this in my last message: >> >> https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS >> >> One other aspect of ZFS I overlooked in my earlier messages is the built >> in compression. At one point I backed up 460TB of data from our GPFS system >> onto ~300TB of space on a ZFS system using gzip-9 compression on the target >> filesystem, thereby gaining compression that was transparent to the users. >> The benefits of ZFS are really too numerous to cover and the flexibility it >> adds for managing storage open up whole new solution spaces to explore. For >> me it is the go-to filesystem for the first layer on the disks. >> >> jbh >> >> >> >> >> On Tue, Feb 14, 2017 at 4:16 PM Tony Brian Albers <t...@kb.dk> wrote: >> >>> On 2017-02-14 11:44, Jörg Saßmannshausen wrote: >>> > Hi John, >>> > >>> > thanks for the very interesting and informative post. >>> > I am looking into large storage space right now as well so this came >>> really >>> > timely for me! :-) >>> > >>> > One question: I have noticed you were using ZFS on Linux (CentOS 6.8). >>> What >>> > are you experiences with this? Does it work reliable? How did you >>> configure the >>> > file space? >>> > From what I have read is the best way of setting up ZFS is to give ZFS >>> direct >>> > access to the discs and then install the ZFS 'raid5' or 'raid6' on top >>> of >>> > that. Is that what you do as well? >>> > >>> > You can contact me offline if you like. >>> > >>> > All the best from London >>> > >>> > Jörg >>> > >>> > On Tuesday 14 Feb 2017 10:31:00 John Hanks wrote: >>> >> I can't compare it to Lustre currently, but in the theme of general, >>> we >>> >> have 4 major chunks of storage: >>> >> >>> >> 1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS >>> clients >>> >> on nodes, this is presented to the cluster through cNFS. >>> >> >>> >> 2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS >>> presented >>> >> via NFS >>> >> >>> >> 3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u >>> server >>> >> with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS >>> >> 2015.xx. BeeGFS clients on all nodes. >>> >> >>> >> 4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS >>> >> presented via NFS >>> >> >>> >> Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over >>> >> wheezing under load. ZFS/NFS single server falls over wheezing under >>> >> slightly less load. BeeGFS tends to fall over a bit more gracefully >>> under >>> >> load. Number 4, NVMe doesn't care what you do, your load doesn't >>> impress >>> >> it at all, bring more. >>> >> >>> >> We move workloads around to whichever storage has free space and >>> works best >>> >> and put anything metadata or random I/O-ish that will fit onto the >>> NVMe >>> >> based storage. >>> >> >>> >> Now, in the theme of specific, why are we using BeeGFS and why are we >>> >> currently planning to buy about 4 PB of supermicro to put behind it? >>> When >>> >> we asked about improving the performance of the DDN, one >>> recommendation was >>> >> to buy GPFS client licenses for all our nodes. The quoted price was >>> about >>> >> 100k more than we wound up spending on the 460 additional TB of >>> Supermicro >>> >> storage and BeeGFS, which performs as well or better. I fail to see >>> the >>> >> inherent value of DDN/GPFS that makes it worth that much of a premium >>> in >>> >> our environment. My personal opinion is that I'll take hardware over >>> >> licenses any day of the week. My general grumpiness towards vendors >>> isn't >>> >> improved by the DDN looking suspiciously like a SuperMicro system >>> when I >>> >> pull the shiny cover off. Of course, YMMV certainly applies here. But >>> >> there's also that incident where we had to do an offline fsck to >>> clean up >>> >> some corrupted GPFS foo and the mmfsck tool had an assertion error, >>> not a >>> >> warm fuzzy moment... >>> >> >>> >> Last example, we recently stood up a small test cluster built out of >>> >> workstations and threw some old 2TB drives in every available slot, >>> then >>> >> used BeeGFS to glue them all together. Suddenly there is a 36 TB >>> filesystem >>> >> where before there was just old hardware. And as a bonus, it'll do >>> >> sustained 2 GB/s for streaming large writes. It's worth a look. >>> >> >>> >> jbh >>> >>> That sounds very interesting, I'd like to hear more about that. How did >>> you manage to use zfs on centos ? >>> >>> /tony >>> >>> -- >>> Best regards, >>> >>> Tony Albers >>> Systems administrator, IT-development >>> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark. >>> Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316 >>> <+45%2089%2046%2023%2016> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> -- >> ‘[A] talent for following the ways of yesterday, is not sufficient to >> improve the world of today.’ >> - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf