On Mon, Mar 18, 2019 at 8:52 AM Will Dennis <wden...@nec-labs.com> wrote: > > I am considering using BeeGFS for a parallel file system for one (and if > successful, more) of our clusters here. Just wanted to get folks’ opinions on > that, and if there is any “gotchas” or better-fit solutions out there... The > first cluster I am considering it for has ~50TB storage off a single ZFS > server serving the data over NFS currently; looking to increase not only > storage capacity, but also I/O speed. The cluster nodes that are consuming > the storage have 10GbaseT interconnects, as does the ZFS server. As we are a > smaller shop, want to keep the solution simple. BeeGFS was recommended to me > as a good solution off another list, and wanted to get people’s opinions off > this list.
We're in the midst of migrating our cluster storage from a, err, network appliance to BeeGFS. We currently have 4 storage servers (2 HA pairs) and 2 metadata servers (each running 4 metadata threads, mirrored between the servers) serving 1.4PB of available space. As configured, we've seen the system put out over 600,000 IOPS and aggregrate read speeds of over 12,000MB/s. We're actually going to be adding 6 more storage servers and 2 more metadata servers in the near future. So, yeah, we're pretty happy with it. One rather nice feature is the ability to see, at any point, which users and/or hosts are generating the most load. That being said, there are currently a few of gotchas/pain points: 1) We're using ZFS under BeeGFS, and the storage servers are rather cycle hungry. If you go that route, get boxes with lots of fast cores. 2) In previous versions, you could mix and match point releases between servers and clients -- as long as the major version was the same, you were fine. As of v7, that's no longer the case. IOW, moving from 7.0 to 7.1 requires unmounting all the clients, shutting down all the daemons, updating all the software, and then restarting everything. Painful. 3) Also as of v7, the mgmtd service is *critical*. Any communication interruption to/from the mgmtd results in the clients immediately hanging. And, unlike storage and metadata, there is currently no mirroring/HA mechanism within BeeGFS for the mgmtd. We do have a support contract and the folks from Thinkparq are responsive. If you have more questions, please feel free to ask away. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf