Hi Robert, Thanks for your reply.
I am pretty sure the storage is going over ethernet (cat6 gigabit, 10gig copper is comming soon, maybe). I was not aware I could use NFS over IB. I will try running the tests over the weekend. thanks for the tip. Best, leo On Thu, 10 Aug 2023 at 21:43, Robert Taylor <r...@wi.mit.edu> wrote: > Two 4tb spinning drives are not going to have a lot of throughput, and > with 40 tasks all working on different files, if it's random IO, I think > they will get crushed. > > What are the sequential read and write rates from any one node doing > single threaded io to the nfs server? > > Can you do a dd test? > This should write a 1gig file straight from memory on the node it is run > on. > > dd if=/dev/zero of=/mnt/nfsshare bs=1M count=1000 > > (make sure zfs compression is off, or that will give bogus numbers) > You should get a time summary, and a throughput speed. > That is pure sequential IO that comes from memory, which is probably the > best that one machine can do, (unless the dd becomes cpu bound). > > We have some high end netapp and isilon storage systems where I work, and > I've gotten between 400MB/s to 1GB/s out of nfs, and the 1gig I believe was > bottlenecked at the source node, because all it had was a 10 gig connection > to the network. Once I can get the nodes to 25g, I will test again, but I'm > not there yet. > > Also are you sure the storage is going over IB and not gige? (is the cat6e > 1gig ethernet, or do you have copper 10gig) > > rgt > > > > > > On Thu, Aug 10, 2023 at 3:29 PM Bernd Schubert <bernd.schub...@fastmail.fm> > wrote: > >> >> >> On 8/10/23 21:18, leo camilo wrote: >> > Hi everyone, >> > >> > I was hoping I would seek some sage advice from you guys. >> > >> > At my department we have build this small prototyping cluster with 5 >> > compute nodes,1 name node and 1 file server. >> > >> > Up until now, the name node contained the scratch partition, which >> > consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool >> is >> > shared to all the nodes using nfs. The compute nodes and the name node >> > and compute nodes are connected with both cat6 ethernet net cable and >> > infiniband. Each compute node has 40 cores. >> > >> > Recently I have attempted to launch computation from each node (40 >> tasks >> > per node), so 1 computation per node. And the performance was abysmal. >> > I reckon I might have reached the limits of NFS. >> > >> > I then realised that this was due to very poor performance from NFS. I >> > am not using stateless nodes, so each node has about 200 GB of SSD >> > storage and running directly from there was a lot faster. >> > >> > So, to solve the issue, I reckon I should replace NFS with something >> > better. I have ordered 2x4TB NVMEs for the new scratch and I was >> > thinking of : >> > >> > * using the 2x4TB NVME in a striped ZFS pool and use a single node >> > GlusterFS to replace NFS >> > * using the 2x4TB NVME with GlusterFS in a distributed arrangement >> > (still single node) >> > >> > Some people told me to use lustre,but I reckon that might be overkill. >> > And I would only use a single fileserver machine(1 node). >> > >> > Could you guys give me some sage advice here? >> > >> >> So glusterfs is using fuse, which doesn't have the best performance >> reputation (although hopefully not for long - feel free to search for >> "fuse" + "uring"). >> >> If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well, >> I would recommend to look into it anyway (as former developer I'm biased >> again ;) ). >> >> >> Cheers, >> Bernd >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf