Re: [Beowulf] zfs tuning for HJPC/cluster workloads?

Joe Landman Sun, 06 Jul 2008 13:52:09 -0700

Loic Tortay wrote:

Joe Landman wrote:


[...]

We have seen the same issue on (non Sun) high density storage serverswhich performed correctly with RHEL5 & XFS but comparatively poorly withSolaris 10 & ZFS.
ZFS seems to be extremely sensitive to the quality/behaviour of thedriver for the HBA or RAID/disk controller, especially with SATA disks(for NCQ support). Having a driver is not enough, a good one is required.
Another point is that ZFS requires a different configuration "mindset" than
"ordinary" RAID.


Hmmmm....

Here is what I like.  Setting up a raid is painless.  Really painless.

Here is what I don't like. I can't tune that raid. Well, I can, bytearing it down and starting again. I tried turning off checksum,compression, even zil.

The thing I wanted to do was to put the log onto another device, andfollowing the man pages on this resulted in errors. zpool would nothave it.

Have you noticed the "small vdev" advice on the Solaris Internals Wiki ?

Yeah, they mention 10 drives or less. I tried it with two 8-drivevdevs, 1x 16-drive vdev, and a few other things.

This is probably the single most important hint for ZFS configuration.
IOW, most of the time you can't just use the same underlyingconfiguration with ZFS as the one you (would) use with Linux.
This means that you may need to trade usable space for performance,
sometimes in more drastic ways than with ordinary RAID.

Tried a few methods. Understand, we have a preference to show thefastest possible speed on our units. So we want to figure out how totune/tweak zfs for these systems.

Finally, like it or not, ZFS is often more happy/efficient when it doesthe RAID itself (no "hardware" RAID controller or LVM involved).

The performance on pure zfs sw-only raid was lower (significantly) thanthe hardware RAID running solaris. I tried several variations on this.That and the crashing (driver related I believe) concern me. I wouldlike to be able to get the performance that some imply I can get out of it.


I certainly would like to be able to tune it.

Loïc.
PS: regarding your other message in this thread (and your blog), youseem confused: the "open source" OS is OpenSolaris, not Solaris 10.

Hmmm .... we keep hearing that "Solaris is open source" withoutproviding any distinction between Sun Solaris and Open Solaris. Maybeit is marketing not being precise on this. Ask your Sun sales rep ifSolaris is open source, without specifying which one. The answer willbe "yes". Ambiguity? Yes. On purpose? I dunno.

The benchmark publishing restriction only applies to Solaris 10 (see<http://www.opensolaris.com/licensing/opensolaris_license/>).


Yup.  Will eventually try OpenSolaris on this gear.

PPS: while I dislike Sun's policy, I specifically remember being told bysomeone from a DOE lab (who did actually evaluate your product about 18months ago) that you didn't want their unfavorable benchmarks results tobe published. You can't have it both ways.

Owie ... no one is having it "both ways" Luc. Everything we are doingin testing is in the open, and we invite both comment and criticism ...like "Hey buddy, turn up read-ahead" or "luser, turn off compression."Our tests and results are open. Others can run them, and reportback results. If they give me permission to publish them, I will. Ifthey publish them, I may critique them (we reserve the right to respond).

As a note also, you just dragged an external group into this discussion,and I am guessing that they really didn't want to be. So I am going totread carefully here.

We published a critique of the published "evaluation", pointing to thefaults, and doing a thorough job of analyzing the same. We didn't denythem the right to publish their results. As a result of this, we got inreturn, a rather nasty email/blog post trail. I still have it in mymail archives, and it is hidden in the blog archives. I won't rehashit, other than to point out that some on this list would take issue withthe results.

I removed my critique after they asked me to, with them promising inreturn to amend and address my criticisms. As far as I can tell, theywithdrew their report, and did not amend or address my criticisms.

More curious are the reports that the group responsible for this report,has run away from their (formerly) preferred platform towards a BlueArcplatform. There was a nice quote from the principal author of thereport to this effect (moving forward with BlueArc) last year inHPCWire, for what they were considering the other unit (thumper) for.

This said, they were free to use the unit and publish benchmark results,which they did. We criticized the benchmark they did for its flaws inanalysis, in execution, and setup, as we were free to do.

Nobody is having it "both ways" Luc. We reserve the right to respond,and we did. We did not ask them to take down the report. They did askus to take our criticisms of their report down.

FWIW: I will not name or divulge the group's name in public or private.I ask that anyone with knowledge of this group also keep theirnames/affiliation out of the discussion. Luc dragged them in here, andI would like to accord them some measure of privacy, no matter whether Iagree or disagree with them.


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] zfs tuning for HJPC/cluster workloads?

Reply via email to