If I can help, I'm inside IBM. I'm the marketing lead for IBM Spectrum
Scale (aka GPFS), but I have solid connections to the field tech support
and development teams.

my corporate email is dougla...@us.ibm.com

IBM just announced that HortonWorks will be supported on IBM Spectrum
Scale. IBM has a lot of development focus on the Hadoop/Spark use case.



On Tue, Feb 14, 2017 at 12:00 PM, Jeffrey Layton <layto...@gmail.com> wrote:

> Of course there are tons of options depending upon what you want and your
> IO patterns of the applications.
>
> Doug's comments about HDFS are great - he's a very good expert in this
> area.
>
> Depending upon your IO patterns and workload, NFS may work well. I've
> found it work quite well unless you have a bunch of clients really
> hammering it. There are some tuning options you can use to improve this
> behavior (i.e. more clients beating on it before it collapses). It's good
> to have lots of memory in the NFS server. Google for "Dell, NSS" and you
> should find some documents on tuning options that Dell created that work
> VERY well.
>
> Another option for NFS is to consider using async mounts. This can
> definitely increase performance but you just have to be aware of the
> downside - if the server goes down, you could lose data from the clients
> (data in flight). But I've seen some massive performance gains when using
> async mounts.
>
> BTW - if you have IB, consider using NFS with IPoIB. This can boost
> performance as well. The recent kernels have RDMA capability for NFS.
>
> If you need encryption over the wire, then consider sshfs. It uses FUSE so
> you can mount directories from any host you have SSH access (be sure to NOT
> use password-less SSH :) ). There are some pretty good tuning options for
> it as well.
>
> For distributed file systems there are some good options: Lustre, BeeGFS,
> OrangeFS, Ceph, Gluster, Moose, OCFS2, etc. (my apologies to any
> open-source file systems that I've forgotten). I personally like all of
> them :)  I've used Lustre, BeeGFS, and OrangeFS in current and past lives.
> I've found BeeGFS to be very easy to configure. The performance seems to be
> on par with Lustre for the limited testing I did but it's always best to
> test your own applications (that's true for any file system or storage
> solution).
>
> There are also commercial solutions that should not be ignored if you want
> to go that route. There are bunch of them out there - GPFS, Panasas,
> Scality, and others.
>
> I hope some of these pointers help.
>
> Jeff
>
>
> On Tue, Feb 14, 2017 at 5:47 AM, John Hanks <griz...@gmail.com> wrote:
>
>> Should have included this in my last message:
>>
>> https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS
>>
>> One other aspect of ZFS I overlooked in my earlier messages is the built
>> in compression. At one point I backed up 460TB of data from our GPFS system
>> onto ~300TB of space on a ZFS system using gzip-9 compression on the target
>> filesystem, thereby gaining compression that was transparent to the users.
>> The benefits of ZFS are really too numerous to cover and the flexibility it
>> adds for managing storage open up whole new solution spaces to explore. For
>> me it is the go-to filesystem for the first layer on the disks.
>>
>> jbh
>>
>>
>>
>>
>> On Tue, Feb 14, 2017 at 4:16 PM Tony Brian Albers <t...@kb.dk> wrote:
>>
>>> On 2017-02-14 11:44, Jörg Saßmannshausen wrote:
>>> > Hi John,
>>> >
>>> > thanks for the very interesting and informative post.
>>> > I am looking into large storage space right now as well so this came
>>> really
>>> > timely for me! :-)
>>> >
>>> > One question: I have noticed you were using ZFS on Linux (CentOS 6.8).
>>> What
>>> > are you experiences with this? Does it work reliable? How did you
>>> configure the
>>> > file space?
>>> > From what I have read is the best way of setting up ZFS is to give ZFS
>>> direct
>>> > access to the discs and then install the ZFS 'raid5' or 'raid6' on top
>>> of
>>> > that. Is that what you do as well?
>>> >
>>> > You can contact me offline if you like.
>>> >
>>> > All the best from London
>>> >
>>> > Jörg
>>> >
>>> > On Tuesday 14 Feb 2017 10:31:00 John Hanks wrote:
>>> >> I can't compare it to Lustre currently, but in the theme of general,
>>> we
>>> >> have 4 major chunks of storage:
>>> >>
>>> >> 1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS
>>> clients
>>> >> on nodes, this is presented to the cluster through cNFS.
>>> >>
>>> >> 2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS
>>> presented
>>> >> via NFS
>>> >>
>>> >> 3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u
>>> server
>>> >> with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS
>>> >> 2015.xx. BeeGFS clients on all nodes.
>>> >>
>>> >> 4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS
>>> >> presented via NFS
>>> >>
>>> >> Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over
>>> >> wheezing under load. ZFS/NFS single server falls over wheezing under
>>> >> slightly less load. BeeGFS tends to fall over a bit more gracefully
>>> under
>>> >> load.  Number 4, NVMe doesn't care what you do, your load doesn't
>>> impress
>>> >> it at all, bring more.
>>> >>
>>> >> We move workloads around to whichever storage has free space and
>>> works best
>>> >> and put anything metadata or random I/O-ish that will fit onto the
>>> NVMe
>>> >> based storage.
>>> >>
>>> >> Now, in the theme of specific, why are we using BeeGFS and why are we
>>> >> currently planning to buy about 4 PB of supermicro to put behind it?
>>> When
>>> >> we asked about improving the performance of the DDN, one
>>> recommendation was
>>> >> to buy GPFS client licenses for all our nodes. The quoted price was
>>> about
>>> >> 100k more than we wound up spending on the 460 additional TB of
>>> Supermicro
>>> >> storage and BeeGFS, which performs as well or better. I fail to see
>>> the
>>> >> inherent value of DDN/GPFS that makes it worth that much of a premium
>>> in
>>> >> our environment. My personal opinion is that I'll take hardware over
>>> >> licenses any day of the week. My general grumpiness towards vendors
>>> isn't
>>> >> improved by the DDN looking suspiciously like a SuperMicro system
>>> when I
>>> >> pull the shiny cover off. Of course, YMMV certainly applies here. But
>>> >> there's also that incident where we had to do an offline fsck to
>>> clean up
>>> >> some corrupted GPFS foo and the mmfsck tool had an assertion error,
>>> not a
>>> >> warm fuzzy moment...
>>> >>
>>> >> Last example, we recently stood up a small test cluster built out of
>>> >> workstations and threw some old 2TB drives in every available slot,
>>> then
>>> >> used BeeGFS to glue them all together. Suddenly there is a 36 TB
>>> filesystem
>>> >> where before there was just old hardware. And as a bonus, it'll do
>>> >> sustained 2 GB/s for streaming large writes. It's worth a look.
>>> >>
>>> >> jbh
>>>
>>> That sounds very interesting, I'd like to hear more about that. How did
>>> you manage to use zfs on centos ?
>>>
>>> /tony
>>>
>>> --
>>> Best regards,
>>>
>>> Tony Albers
>>> Systems administrator, IT-development
>>> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
>>> Tel: +45 2566 2383 <+45%2025%2066%2023%2083> / +45 8946 2316
>>> <+45%2089%2046%2023%2016>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>> --
>> ‘[A] talent for following the ways of yesterday, is not sufficient to
>> improve the world of today.’
>>  - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to