Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
hi there, Thanks for the advice. >From the messages here I think I have grokked on how to proceed - Swap the HDDs with NVME - replace 1GB ethernet IB - Configure NFS to use IPoIB or RDMA - Tune NFS I will need to get my hands on Lustre eventually, but that can wait. Thanks for the help On Thu

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Michael DiDomenico
i would definitely look more at tuning nfs/backend disks rather then going down the rabbit hole of gluster/lustre/beegfs. you only have five nodes. nfs is a hog, but you're not likely to bottleneck the nfs protocol with only five nodes but for anyone here to give you better advice you'd have to

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Renfro, Michael
As the definitely-not-proud owner of a 2016 purchase of a 60-bay disk shelf attached to a single server with an Infiniband connection back to 54 compute nodes, NFS on spinning disks can definitely handle 5 40-core jobs, but your particular setup really can’t. Mine has hit its limits at times as

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
Awesome, thanks for the info! Best, leo On Thu, 10 Aug 2023 at 22:01, Jeff Johnson wrote: > Leo, > > Both BeeGFS and Lustre require a backend file system on the disks > themselves. Both Lustre and BeeGFS support ZFS backend. > > --Jeff > > > On Thu, Aug 10, 2023 at 1:00 PM leo camilo wrote: >

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Jeff Johnson
Leo, Both BeeGFS and Lustre require a backend file system on the disks themselves. Both Lustre and BeeGFS support ZFS backend. --Jeff On Thu, Aug 10, 2023 at 1:00 PM leo camilo wrote: > Hi there, > > thanks for your response. > > BeeGFS indeed looks like a good call option, though realistical

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
Hi there, thanks for your response. BeeGFS indeed looks like a good call option, though realistically I can only afford to use a single node/server for it. Would it be feasible to use zfs as volume manager coupled with BeeGFS for the shares, or should I write zfs off all together? thanks again,

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
Hi there, I will have a look. thanks for the tip. best, leo On Thu, 10 Aug 2023 at 21:34, John Hearns wrote: > I would look at BeeGFS here > > On Thu, 10 Aug 2023, 20:19 leo camilo, wrote: > >> Hi everyone, >> >> I was hoping I would seek some sage advice from you guys. >> >> At my departmen

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
Hi Robert, Thanks for your reply. I am pretty sure the storage is going over ethernet (cat6 gigabit, 10gig copper is comming soon, maybe). I was not aware I could use NFS over IB. I will try running the tests over the weekend. thanks for the tip. Best, leo On Thu, 10 Aug 2023 at 21:43, Rober

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Robert Taylor
Two 4tb spinning drives are not going to have a lot of throughput, and with 40 tasks all working on different files, if it's random IO, I think they will get crushed. What are the sequential read and write rates from any one node doing single threaded io to the nfs server? Can you do a dd test? T

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Jeff Johnson
Leo, NFS can be a hindrance but if tuned and configured properly might not be as terrible. Some thoughts... - What interface are the nodes accessing NFS via? Ethernet or Infiniband? - Have you tuned the number of NFS server threads above defaults? - As a test, you could deploy a single L

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread John Hearns
I would look at BeeGFS here On Thu, 10 Aug 2023, 20:19 leo camilo, wrote: > Hi everyone, > > I was hoping I would seek some sage advice from you guys. > > At my department we have build this small prototyping cluster with 5 > compute nodes,1 name node and 1 file server. > > Up until now, the nam

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread Bernd Schubert
On 8/10/23 21:18, leo camilo wrote: Hi everyone, I was hoping I would seek some sage advice from you guys. At my department we have build this small prototyping cluster with 5 compute nodes,1 name node and 1 file server. Up until now, the name node contained the scratch partition, which c

[Beowulf] NFS alternative for 200 core compute (beowulf) cluster

2023-08-10 Thread leo camilo
Hi everyone, I was hoping I would seek some sage advice from you guys. At my department we have build this small prototyping cluster with 5 compute nodes,1 name node and 1 file server. Up until now, the name node contained the scratch partition, which consisted of 2x4TB HDD, which form an 8 TB s