Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

Bernd Schubert Thu, 10 Aug 2023 12:29:30 -0700


On 8/10/23 21:18, leo camilo wrote:

Hi everyone,

I was hoping I would seek some sage advice from you guys.
At my department we have build this small prototyping cluster with 5compute nodes,1 name node and 1 file server.
Up until now, the name node contained the scratch partition, whichconsisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool isshared to all the nodes using nfs. The compute nodes and the name nodeand compute nodes are connected with both cat6 ethernet net cable andinfiniband. Each compute node has 40 cores.
Recently I have attempted to launch computation from each node (40 tasksper node), so 1 computation per node. And the performance was abysmal.I reckon I might have reached the limits of NFS.
I then realised that this was due to very poor performance from NFS. Iam not using stateless nodes, so each node has about 200 GB of SSDstorage and running directly from there was a lot faster.
So, to solve the issue, I reckon I should replace NFS with somethingbetter. I have ordered 2x4TB NVMEs for the new scratch and I wasthinking of :
  * using the 2x4TB NVME in a striped ZFS pool and use a single node
    GlusterFS to replace NFS
  * using the 2x4TB NVME with GlusterFS in a distributed arrangement
    (still single node)
Some people told me to use lustre,but I reckon that might be overkill.And I would only use a single fileserver machine(1 node).
Could you guys give me some sage advice here?

So glusterfs is using fuse, which doesn't have the best performancereputation (although hopefully not for long - feel free to search for"fuse" + "uring").

If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well,I would recommend to look into it anyway (as former developer I'm biasedagain ;) ).



Cheers,
Bernd

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] NFS alternative for 200 core compute (beowulf) cluster

Reply via email to