Prentice Bisbal wrote:
Oops. e-mailed to the wrong address. The cat's out of the bag now! No
big deal.  I was 50/50 about CC-ing the list, anyway. Just remove the
phrase "off-list" in the first sentence, and that last bit about not
posting to the list because...

Great. I'll never get a job that requires security clearance now! ;)

--
Prentice <---- still can't figure out how to use e-mail properly

Recently, I was proven to be unable to handle spreadsheets. That can be embarrassing when I claim to be able to manage and write numerical models...

Prentice Bisbal wrote:
Gerry,

I wanted to let you know off-list that I'm going through the same
problems right now. I thought you'd like to know you're not alone.  We
purchased a cluster from the *allegedly* same vendor. The PXE boot and
keyboard errors were the least of our problems.

First, our cluster was delayed 2 months due to shortages of the network
hardware we specified. It was not the vendor standard for clustering,
but still a brand they resold.

When it did arrive, the doors were damaged by the inadequately equipped
delivery co.

When the technician arrived to finish setting up the cluster, he
discovered that the IB cables provided were too short to be within spec:
the bend radius would be too tight, and were too short to be supported
from above the connectors.

And, the final problem I'm going to mention: the fiber network cables to
connect our ethernet switches to each other (we have Ethernet and IB
networks in this cluster) were missing.

It's been over two weeks since our cluster arrived, and one week since
the technician noticed these shortages and reported them. Still haven't
had these problems rectified, and the technician will have to fly to our
site again in a couple weeks to complete the installation.

I'm writing an article about this experience for Doug to publish. I
haven't posted this to the mailing list b/c I'm not sure what my
management will be happy with me sharing (the article will be reviewed
by them before publishing).

I'll add that we paid for next-day service, but I continue to be amaze that this means Matt or I have to evaluate and troubleshoot the node before the vendor sends out service. We can manage to drag "next business day" out a few more days, somehow.

Our iSCSI cables were partially sent, but we were told we'd gotten what they interpreted to be the right number; we bought more and it only took a week or so to get them in. We discovered the RAID shelves we'd gotten, where the RFQ specifically called out RAID6 hardware-capable, weren't, so we're doing JBOD/software RAID6 (our experience has proven that we NEED RAID6). When we enquired about giving back the RAID shelves we were told that wasn't a possibility.

My impression is that the vendor is well-suited for small-medium business-based clusters, but unfamiliar with how things work in the *nix world, overall (I know there are exceptions). I am concerned that each of our compute nodes is, to them, just another webserver, and if it's mission critical, we should have bought all sorts of additional services and a shelf-spare server. Or maybe we should just virtualize (yeah! that's the ticket! a virtual HPC cluster?).

We're starting to look again for HPC resources, but I doubt they'll be asked to bid.

gerry

We recently purchased a set of hardware for a cluster from a hardware vendor. We've encountered a couple of interesting issues with bringing the thing up that I'd like to get group comments on. Note that the RFP and negotiations specified this system was for a cluster installation, so there would be no misunderstanding...

1. We specified "No OS" in the purchase so that we could install CentOS as our base. We got a set of systems with a stub OS, and an EULA for the diagnostics embedded on the disk. After clicking thru the EULA, it tells us we have no OS on the disk, but does not fail to PXE.

2. BIOS had a couple of interesting defaults, including warn on keyboard error (Keyboard? Not intentionally. This is a compute node, and should never require a keyboard. Ever.) We also find the BIOS is set to boot from hard disk THEN PXE. But due to item 1, above, we never can fail over to PXE unless we load up a keyboard and monitor, and hit F12 to drop to PXE.

In discussions with our sales rep, I'm told that we'd have had to pay extra to get a real bare hard disk, and that, for a fee, they'd have been willing to custom-configure the BIOS. OK, with the BIOS this isn't too unreasonable: They have a standard BIOS for all systems and if you want something special, paying for it's the norm... But, still, this is a CLUSTER installation we were quoted, not a desktop.

Also, I'm now told that "almost every customer" ordered their cluster configuration service at several kilobucks per rack. Since the team I'm working with has some degree of experience in configuring and installing hardware and software on computational clusters, now measured in at least 10 separate cluster installations, this seemed like an unnecessary expense. However, we're finding vendor gotchas that are annoying at the least, and sometimes cause significant work-around time/effort.

Finally, our sales guy yesterday was somewhat baffled as to why we'd ordered without OS, and further why we were using Linux over Windows for HPC. Not trying to revive the recent rant-fest about Windows HPC capabilities, can anyone cite real HPC applications generally run on significant clusters (I'll accept Cornell's work, although I remain personally convinced that the bulk of their Windows HPC work has been dedicated to maintaining grant funding rather than doing real work)?

No, I won't identify the vendor.
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


--
Gerry Creager -- [EMAIL PROTECTED]
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to