Re: [Beowulf] NFS over RDMA performance confusion

William Law Sun, 16 Sep 2012 21:18:16 -0700

I'm a little wary about entering the conversation, but I've been spending way 
too much time with ZFS so perhaps this will help.

First, benchmark whatever you will really run - ZFS is odd enough that anything 
involving simulations may not match up to an actual application.  I guess I'd 
argue that is true of any technology.

With database or database-like technology performance can be dramatically 
influenced by the ZFS recordsize.  On some workloads (which I have not yet 
seen, tho Oracle is a frequently cited example so it certainly applies to some 
DBs) logbias needs to be set to throughput to see reasonable performance.  The 
general principle is to set the recordsize to the same size as your writes.  
Here is what Oracle says for MySQL: 
https://blogs.oracle.com/realneel/entry/mysql_innodb_zfs_best_practices

I'd tend to follow their practices as Nexenta is mostly silent on applications 
at the moment.

Network tuning is also an issue.  If you are using anything IP based (which NFS 
over RDMA fortunately gets you out of), also look at things like turning off 
Nagle's algorithm.  Last time I used IB on solaris the drivers were a little 
weird, but I will admit it was on a niagara box about…. 4 years ago? 

Remember too that L2ARC only caches reads and the ZIL only caches writes, I 
think specifically writes smaller than 32k.  Under random IO ZFS is limited  to 
the performance of a single disk in each VDEV which is drastically different 
from most other storage systems.

It is impressive technology but is also…  a bit complicated.

Will

On Sep 13, 2012, at 4:52 AM, hol...@th.physik.uni-frankfurt.de wrote:

> Hi,
> 
> I am a bit confused.
> 
> I have 4 top notch dual socket machines with 128GB ram each. I also have a
> Nexenta box which is my NFS / ZFS server. Everything is connected together
> with QDR infiniband.
> 
> I want to use this for setup for mysql databases so I am testing for 16K
> random and stride performance.
> 
> If I set up a single machine to hammer the fileserver with IOzone I see
> something like 50,000 IOPS but if all four machines are hammering the
> filesystem concurrently we got it up to 180,000 IOPS.
> 
> Can anyone tell me what might be the bottleneck on the single machines?
> Why can I not get 180,000 IOPS when running on a single machine.
> 
> If I test using IPoIB in connected mode I see this: http://pastie.org/4708542
> 
> Some kind of buffer problem?
> 
> Thanks,
> 
> Andrew
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] NFS over RDMA performance confusion

Reply via email to