Prentice Bisbal wrote:
Oops. e-mailed to the wrong address. The cat's out of the bag now! No
big deal. I was 50/50 about CC-ing the list, anyway. Just remove the
phrase "off-list" in the first sentence, and that last bit about not
posting to the list because...
Great. I'll never get a job that requires security clearance now! ;)
--
Prentice <---- still can't figure out how to use e-mail properly
Recently, I was proven to be unable to handle spreadsheets. That can be
embarrassing when I claim to be able to manage and write numerical models...
Prentice Bisbal wrote:
Gerry,
I wanted to let you know off-list that I'm going through the same
problems right now. I thought you'd like to know you're not alone. We
purchased a cluster from the *allegedly* same vendor. The PXE boot and
keyboard errors were the least of our problems.
First, our cluster was delayed 2 months due to shortages of the network
hardware we specified. It was not the vendor standard for clustering,
but still a brand they resold.
When it did arrive, the doors were damaged by the inadequately equipped
delivery co.
When the technician arrived to finish setting up the cluster, he
discovered that the IB cables provided were too short to be within spec:
the bend radius would be too tight, and were too short to be supported
from above the connectors.
And, the final problem I'm going to mention: the fiber network cables to
connect our ethernet switches to each other (we have Ethernet and IB
networks in this cluster) were missing.
It's been over two weeks since our cluster arrived, and one week since
the technician noticed these shortages and reported them. Still haven't
had these problems rectified, and the technician will have to fly to our
site again in a couple weeks to complete the installation.
I'm writing an article about this experience for Doug to publish. I
haven't posted this to the mailing list b/c I'm not sure what my
management will be happy with me sharing (the article will be reviewed
by them before publishing).
I'll add that we paid for next-day service, but I continue to be amaze
that this means Matt or I have to evaluate and troubleshoot the node
before the vendor sends out service. We can manage to drag "next
business day" out a few more days, somehow.
Our iSCSI cables were partially sent, but we were told we'd gotten what
they interpreted to be the right number; we bought more and it only took
a week or so to get them in. We discovered the RAID shelves we'd
gotten, where the RFQ specifically called out RAID6 hardware-capable,
weren't, so we're doing JBOD/software RAID6 (our experience has proven
that we NEED RAID6). When we enquired about giving back the RAID
shelves we were told that wasn't a possibility.
My impression is that the vendor is well-suited for small-medium
business-based clusters, but unfamiliar with how things work in the *nix
world, overall (I know there are exceptions). I am concerned that each
of our compute nodes is, to them, just another webserver, and if it's
mission critical, we should have bought all sorts of additional services
and a shelf-spare server. Or maybe we should just virtualize (yeah!
that's the ticket! a virtual HPC cluster?).
We're starting to look again for HPC resources, but I doubt they'll be
asked to bid.
gerry
We recently purchased a set of hardware for a cluster from a hardware
vendor. We've encountered a couple of interesting issues with bringing
the thing up that I'd like to get group comments on. Note that the RFP
and negotiations specified this system was for a cluster installation,
so there would be no misunderstanding...
1. We specified "No OS" in the purchase so that we could install CentOS
as our base. We got a set of systems with a stub OS, and an EULA for
the diagnostics embedded on the disk. After clicking thru the EULA, it
tells us we have no OS on the disk, but does not fail to PXE.
2. BIOS had a couple of interesting defaults, including warn on
keyboard error (Keyboard? Not intentionally. This is a compute node,
and should never require a keyboard. Ever.) We also find the BIOS is
set to boot from hard disk THEN PXE. But due to item 1, above, we never
can fail over to PXE unless we load up a keyboard and monitor, and hit
F12 to drop to PXE.
In discussions with our sales rep, I'm told that we'd have had to pay
extra to get a real bare hard disk, and that, for a fee, they'd have
been willing to custom-configure the BIOS. OK, with the BIOS this isn't
too unreasonable: They have a standard BIOS for all systems and if you
want something special, paying for it's the norm... But, still, this is
a CLUSTER installation we were quoted, not a desktop.
Also, I'm now told that "almost every customer" ordered their cluster
configuration service at several kilobucks per rack. Since the team I'm
working with has some degree of experience in configuring and installing
hardware and software on computational clusters, now measured in at
least 10 separate cluster installations, this seemed like an unnecessary
expense. However, we're finding vendor gotchas that are annoying at the
least, and sometimes cause significant work-around time/effort.
Finally, our sales guy yesterday was somewhat baffled as to why we'd
ordered without OS, and further why we were using Linux over Windows for
HPC. Not trying to revive the recent rant-fest about Windows HPC
capabilities, can anyone cite real HPC applications generally run on
significant clusters (I'll accept Cornell's work, although I remain
personally convinced that the bulk of their Windows HPC work has been
dedicated to maintaining grant funding rather than doing real work)?
No, I won't identify the vendor.
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
--
Gerry Creager -- [EMAIL PROTECTED]
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf