Re: [Beowulf] SATA II - PXE+NFS - diskless compute nodes

Michael Will Fri, 08 Dec 2006 13:44:32 -0800

Geoff Jacobs wrote:

Mark Hahn wrote:

it's interesting that SAS advertising has obscured the fact that SAS is
just a further development of SCSI, and not interchangable
with SATA.  for instance, no SATA controller will support any SAS disk,
and any SAS setup uses a form of encapsulation to communicate with
the foreign SATA protocol.  SAS disks follow the traditional price
formula of SCSI disks (at least 4x more than non-boutique disks),
and I suspect the rest of SAS infrastructure will be in line with that.

Yes, SAS encapsulates SATA, but not vice-versa. The ability to use a
hardware raid SAS controller with large numbers of inexpensive SATA
drives is very attractive. I was also trying to be thorough.

and be mindful of reliability issues with desktop drives.

I would claim that this is basically irrelevant for beowulf.
for small clusters (say, < 100 nodes), you'll be hitting a negligable
number of failures per year.  for larger clusters, you can't afford
any non-ephemeral install on the disks anyway - reboot-with-reimage
should only take a couple minutes more than a "normal" reboot.
and if you take the no-install (NFS root) approach (which I strongly
recommend) the status of a node-local disks can be just a minor node
property to be handled by the scheduler.

PXE/NFS is absolutely the slickest way to go, but any service nodes
should have some guarantee of reliability. In my experience, disks
(along with power supplies) are two of the most common points of failure

Most of the clusters we configure for our customers use diskless computenodes to minimize compute node failure forprecisely the reason you mentioned unless either the application canbenefit from additionallocal scratchspace (i.e. software raid0 over four sata drives allows toread/write large datastreamsat 280MB/s in a 1U server with 3TB of disk space on each compute node),or because they need tosometimes run jobs that require more virtual memory than they can affordto put in physically -> local swapspace.

We find that customers don't typically want to pay for the premium forredundant power supplies+pdus+cablingfor the compute nodes through, that's something that is typicallyrequested for head nodes and NFS servers.

Also we find that NFS-offloading on the NFS-server with the rapidfilecard helps avoid scalability issueswhere the NFS server bogs down under massively parallel requests fromsay 128 cores in a 32 compute node dualcpu dual core cluster. The rapidfile card is a pci-x card with two fibrechannel ports + two gige ports +nfs/cifs offloading processor on the same card. Since most bulk datatransfer is redirected from fibre channel to gige nfsclients without passing through the NFS server cpu+ram itself, the nfsservers cpu load is not becoming the bottleneck,we find it's rather the amount of spindles before saturating the twogige ports.

We configure clusters for our customers with Scyld Beowulf which doesnot nfs-mountroot but rather just nfs-mounts the home directories because of itsparticular lightweightcompute node model, (PXE booting into RAM) and so does not run into thetypical

nfs-root scalability issues.

Michael

Michael Will
SE Technical Lead / Penguin Computing / www.penguincomputing.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] SATA II - PXE+NFS - diskless compute nodes

Reply via email to