I'm surprised no one here has mentioned tuning kernel/network
parameters. I would take at which of these parameters you can tune to
improve performance first because it's free, quick, and the least
labor-intensive way to improve performance. I would take a look at the
website below and see what parameters you can tweak to improve your
performance.
https://fasterdata.es.net/
Prentice
On 8/10/23 3:35 PM, Jeff Johnson wrote:
Leo,
NFS can be a hindrance but if tuned and configured properly might not
be as terrible. Some thoughts...
* What interface are the nodes accessing NFS via? Ethernet or
Infiniband?
* Have you tuned the number of NFS server threads above defaults?
* As a test, you could deploy a single Lustre node that would act as
MGS/MDS and OSS simultaneously to test for performance gains via
Infiniband.
* Your scratch volume must really be scratch because you are
running with no parity protection (two disk os SSD stripe)
* You're probably better off with tuned NFS as opposed to GlusterFS
--Jeff
On Thu, Aug 10, 2023 at 12:19 PM leo camilo <lhcam...@gmail.com> wrote:
Hi everyone,
I was hoping I would seek some sage advice from you guys.
At my department we have build this small prototyping cluster with
5 compute nodes,1 name node and 1 file server.
Up until now, the name node contained the scratch partition, which
consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The
pool is shared to all the nodes using nfs. The compute nodes and
the name node and compute nodes are connected with both cat6
ethernet net cable and infiniband. Each compute node has 40 cores.
Recently I have attempted to launch computation from each node (40
tasks per node), so 1 computation per node. And the performance
was abysmal. I reckon I might have reached the limits of NFS.
I then realised that this was due to very poor performance from
NFS. I am not using stateless nodes, so each node has about 200 GB
of SSD storage and running directly from there was a lot faster.
So, to solve the issue, I reckon I should replace NFS with
something better. I have ordered 2x4TB NVMEs for the new scratch
and I was thinking of :
* using the 2x4TB NVME in a striped ZFS pool and use a single
node GlusterFS to replace NFS
* using the 2x4TB NVME with GlusterFS in a distributed
arrangement (still single node)
Some people told me to use lustre,but I reckon that might be
overkill. And I would only use a single fileserver machine(1 node).
Could you guys give me some sage advice here?
Thanks in advance
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing
jeff.john...@aeoncomputing.com
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001 f: 858-412-3845
m: 619-204-9061
4170 Morena Boulevard, Suite C - San Diego, CA 92117
High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
Beowulf mailing list,Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe)
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf