Leo, NFS can be a hindrance but if tuned and configured properly might not be as terrible. Some thoughts...
- What interface are the nodes accessing NFS via? Ethernet or Infiniband? - Have you tuned the number of NFS server threads above defaults? - As a test, you could deploy a single Lustre node that would act as MGS/MDS and OSS simultaneously to test for performance gains via Infiniband. - Your scratch volume must really be scratch because you are running with no parity protection (two disk os SSD stripe) - You're probably better off with tuned NFS as opposed to GlusterFS --Jeff On Thu, Aug 10, 2023 at 12:19 PM leo camilo <lhcam...@gmail.com> wrote: > Hi everyone, > > I was hoping I would seek some sage advice from you guys. > > At my department we have build this small prototyping cluster with 5 > compute nodes,1 name node and 1 file server. > > Up until now, the name node contained the scratch partition, which > consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is > shared to all the nodes using nfs. The compute nodes and the name node and > compute nodes are connected with both cat6 ethernet net cable and > infiniband. Each compute node has 40 cores. > > Recently I have attempted to launch computation from each node (40 tasks > per node), so 1 computation per node. And the performance was abysmal. I > reckon I might have reached the limits of NFS. > > I then realised that this was due to very poor performance from NFS. I am > not using stateless nodes, so each node has about 200 GB of SSD storage and > running directly from there was a lot faster. > > So, to solve the issue, I reckon I should replace NFS with something > better. I have ordered 2x4TB NVMEs for the new scratch and I was thinking > of : > > > - using the 2x4TB NVME in a striped ZFS pool and use a single node > GlusterFS to replace NFS > - using the 2x4TB NVME with GlusterFS in a distributed arrangement > (still single node) > > Some people told me to use lustre,but I reckon that might be overkill. And > I would only use a single fileserver machine(1 node). > > Could you guys give me some sage advice here? > > Thanks in advance > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf