Geoff Jacobs wrote:
Mark Hahn wrote:
it's interesting that SAS advertising has obscured the fact that SAS is
just a further development of SCSI, and not interchangable
with SATA.  for instance, no SATA controller will support any SAS disk,
and any SAS setup uses a form of encapsulation to communicate with
the foreign SATA protocol.  SAS disks follow the traditional price
formula of SCSI disks (at least 4x more than non-boutique disks),
and I suspect the rest of SAS infrastructure will be in line with that.
Yes, SAS encapsulates SATA, but not vice-versa. The ability to use a
hardware raid SAS controller with large numbers of inexpensive SATA
drives is very attractive. I was also trying to be thorough.

and be mindful of reliability issues with desktop drives.
I would claim that this is basically irrelevant for beowulf.
for small clusters (say, < 100 nodes), you'll be hitting a negligable
number of failures per year.  for larger clusters, you can't afford
any non-ephemeral install on the disks anyway - reboot-with-reimage
should only take a couple minutes more than a "normal" reboot.
and if you take the no-install (NFS root) approach (which I strongly
recommend) the status of a node-local disks can be just a minor node
property to be handled by the scheduler.
PXE/NFS is absolutely the slickest way to go, but any service nodes
should have some guarantee of reliability. In my experience, disks
(along with power supplies) are two of the most common points of failure
Most of the clusters we configure for our customers use diskless compute nodes to minimize compute node failure for precisely the reason you mentioned unless either the application can benefit from additional local scratchspace (i.e. software raid0 over four sata drives allows to read/write large datastreams at 280MB/s in a 1U server with 3TB of disk space on each compute node), or because they need to sometimes run jobs that require more virtual memory than they can afford to put in physically -> local swapspace.

We find that customers don't typically want to pay for the premium for redundant power supplies+pdus+cabling for the compute nodes through, that's something that is typically requested for head nodes and NFS servers.

Also we find that NFS-offloading on the NFS-server with the rapidfile card helps avoid scalability issues where the NFS server bogs down under massively parallel requests from say 128 cores in a 32 compute node dual cpu dual core cluster. The rapidfile card is a pci-x card with two fibre channel ports + two gige ports + nfs/cifs offloading processor on the same card. Since most bulk data transfer is redirected from fibre channel to gige nfs clients without passing through the NFS server cpu+ram itself, the nfs servers cpu load is not becoming the bottleneck, we find it's rather the amount of spindles before saturating the two gige ports.

We configure clusters for our customers with Scyld Beowulf which does not nfs-mount root but rather just nfs-mounts the home directories because of its particular lightweight compute node model, (PXE booting into RAM) and so does not run into the typical
nfs-root scalability issues.

Michael

Michael Will
SE Technical Lead / Penguin Computing / www.penguincomputing.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to