David Mathog wrote:
Joe Landman <land...@scalableinformatics.com> wrote
So along comes a drive manufacturer, with some nice looking specs on 2TB
(and some 1.5 and 1 TB) drives. They look great on paper. We get them
into our labs, and play with them, and they seem to run really well.
Occasional hiccup on building RAIDs, but you get that in large batches
of drives.
So now they are out in the field for months, under various loads. Some
in our DeltaV's, some in our JackRabbits. The units in the DeltaV's
seem to have a ridiculously high failure rate. This is not something we
see in the lab. Even with constant stress, horrific sustained workloads
... they don't fail in ou testing. But get these same drives out into
the users hands ... and whammo.
Slightly different drives in our JackRabbit units, with a variety of
RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc.
This is not something we see in the lab in our testing. We try
emulating their environments, and we can't generate the failures.
Worse, we get the drives back after exchanging them at our cost with new
replacements, only to find out, upon running diagnostics, that the
drives haven't failed according to the test tool. This failing drive
vendor refuses to acknowledge firmware bugs, effectively refuses to
release patches/fixes.
While there is no doubting that these drives didn't work reliably in
your arrays, that doesn't necessarily mean they were "defective". Just
playing devil's advocate here, but it could be the array controller is
using some feature where there is a bit of wiggle room in the standard,
so that both the disk and the controller are "conforming", but they
still won't work together reliably. In a situation like that I would
expect the vendor to disclose the issue, so it would be clear why the
disks had to come from A and not B. As long as the vendor explained the
problem clearly most customers would be fine buying the preferred disks.
I agree that some devices work well with others. This is what we see.
Some do not. We have a few boxful's of 1TB drives that don't play well
with others.
And yes, standards do leave wiggle room. Interop testing days are
critical. A connect-a-thon very helpful.
But the point is, just because it says SATA, you shouldn't expect that
it will work with all SATA controllers. No ... seriously. Likewise
this is true with many other components.
Some stuff doesn't play well with others.
I didn't sanction the language used, I thought it wrong. But from a
support scenario, it can be (and often is) a nightmare. We take
ownership of as little or as much of what our customers want us to do.
If your name is on the box, no-one appreciates a finger pointing
exercise rather than a path to solution.
It's when the vendor says "you have to use OUR disks" and doesn't tell
you why, and when, as far as you can tell, these are the same devices
that you could buy directly from the manufacturer without the 5X markup,
that things smell bad.
I agree with this paragraph. We won't name specific names in public, we
do speak about our drive issues in private with our customers.
5X markup? We must be doing something wrong :/
Regards,
David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf