David Mathog wrote:
Joe Landman <land...@scalableinformatics.com> wrote

So along comes a drive manufacturer, with some nice looking specs on 2TB (and some 1.5 and 1 TB) drives. They look great on paper. We get them into our labs, and play with them, and they seem to run really well. Occasional hiccup on building RAIDs, but you get that in large batches of drives.

So now they are out in the field for months, under various loads. Some in our DeltaV's, some in our JackRabbits. The units in the DeltaV's seem to have a ridiculously high failure rate. This is not something we see in the lab. Even with constant stress, horrific sustained workloads ... they don't fail in ou testing. But get these same drives out into the users hands ... and whammo.

Slightly different drives in our JackRabbit units, with a variety of RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc.

This is not something we see in the lab in our testing. We try emulating their environments, and we can't generate the failures.

Worse, we get the drives back after exchanging them at our cost with new replacements, only to find out, upon running diagnostics, that the drives haven't failed according to the test tool. This failing drive vendor refuses to acknowledge firmware bugs, effectively refuses to release patches/fixes.

While there is no doubting that these drives didn't work reliably in
your arrays, that doesn't necessarily mean they were "defective".  Just
playing devil's advocate here, but it could be the array controller is
using some feature where there is a bit of wiggle room in the standard,
so that both the disk and the controller are "conforming", but they
still won't work together reliably.  In a situation like that I would
expect the vendor to disclose the issue, so it would be clear why the
disks had to come from A and not B.  As long as the vendor explained the
problem clearly most customers would be fine buying the preferred disks.

I agree that some devices work well with others. This is what we see. Some do not. We have a few boxful's of 1TB drives that don't play well with others.

And yes, standards do leave wiggle room. Interop testing days are critical. A connect-a-thon very helpful.

But the point is, just because it says SATA, you shouldn't expect that it will work with all SATA controllers. No ... seriously. Likewise this is true with many other components.

Some stuff doesn't play well with others.

I didn't sanction the language used, I thought it wrong. But from a support scenario, it can be (and often is) a nightmare. We take ownership of as little or as much of what our customers want us to do. If your name is on the box, no-one appreciates a finger pointing exercise rather than a path to solution.


It's when the vendor says "you have to use OUR disks" and doesn't tell
you why, and when, as far as you can tell, these are the same devices
that you could buy directly from the manufacturer without the 5X markup,
that things smell bad.

I agree with this paragraph. We won't name specific names in public, we do speak about our drive issues in private with our customers.

5X markup?  We must be doing something wrong :/


Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
       http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to