On Mon, 15 Feb 2010 20:41:08 -0500 Joe Landman <land...@scalableinformatics.com> wrote:
> Rahul Nabar wrote: > > This was the response from Dell, I especially like the analogy: > > > > [snip] > >> There are a number of benefits for using Dell qualified drives in > >> particular ensuring a ***positive experience*** and protecting > >> ***our data***. While SAS and SATA are industry standards there are > >> differences which occur in implementation. An analogy is that > >> English is spoken in the UK, US >and Australia. While the language > >> is generally the same, there are subtle differences in word usage > >> which can lead to confusion. This exists in >storage subsystems as > >> well. As these subsystems become more capable, faster and more > >> complex, these differences in implementation can have >greater > >> impact. > > [snip] > > > > I added the emphasis. I am in love Dell-disks that get me "the > > positive experience". :) > > Please indulge my taking a contrarian view based upon the products we > sell/support/ship. > > I see significant derision heaped upon these decisions, which are called > "marketing decisions" by Dell and others. It couldn't be possible, in > most commenter's minds that they might actually have a point ... > > ... I am not defending Dell's language (I wouldn't use this or allow > this to be used in our outgoing marketing/customer communications). > > Let me share an anecdote. I have elided the disk manufacturers name to > protect the guilty. I will not give hints as to whom they are, though > some may be able to guess ... I will not confirm. > > We ship units with 2TB (and 1.5TB) drives among others. We burn in and > test these drives. We work very hard to insure compatibility, and to > make sure that when users get the units, that the things work. We > aren't perfect, and we do occasionally mess up. When we do, we own up > to it and fix it right away. Its a different style of support. The > buck stops with us. Period. > > So along comes a drive manufacturer, with some nice looking specs on 2TB > (and some 1.5 and 1 TB) drives. They look great on paper. We get them > into our labs, and play with them, and they seem to run really well. > Occasional hiccup on building RAIDs, but you get that in large batches > of drives. > > So now they are out in the field for months, under various loads. Some > in our DeltaV's, some in our JackRabbits. The units in the DeltaV's > seem to have a ridiculously high failure rate. This is not something we > see in the lab. Even with constant stress, horrific sustained workloads > ... they don't fail in ou testing. But get these same drives out into > the users hands ... and whammo. > > Slightly different drives in our JackRabbit units, with a variety of > RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc. > > This is not something we see in the lab in our testing. We try > emulating their environments, and we can't generate the failures. > > Worse, we get the drives back after exchanging them at our cost with new > replacements, only to find out, upon running diagnostics, that the > drives haven't failed according to the test tool. This failing drive > vendor refuses to acknowledge firmware bugs, effectively refuses to > release patches/fixes. > > Our other main drive vendor, while not currently with a 2TB drive unit, > doesn't have anything like this manufacturers failure rate in the field. > When drives die in the field, they really ... really die in the field. > And they do fix their firmware. > > So we are now moving off this failing manufacturer (its a shame as they > used to produce quality parts for RAID several years ago), and we are > evaluating replacements for them. Firmware updates are a critical > aspect of a replacement. If the vendor won't allow for a firmware > update, we won't use them. > > So ... this anecdote complete, if someone called me up and said "Joe, I > really want you to build us an siCluster for our storage, and I want you > to use [insert failing manufacturer's name here] drives because we > like them", what do you think my reaction should be? Should it be > "sure, no problem, whatever you want" ... with the subsequent problems > and pain, for which we would be blamed ... or should it be "no, these > drives don't work well ... deep and painful experience at customer sites > shows that they have bugs in their firmware which are problematic for > RAID users ... we are attempting to get them to give us the updated > firmware to help the existing users, but we would not consider shipping > more units with these drives due to their issues." > > Is that latter answer, which is the correct answer, a marketing answer? > But what if the customer tells you, ship me your system without a drive, I'll put whatever I want in there so you are not my point of contact for failing drives but you say, no, I won't allow them in my system and I won't even sell you a replacement of what I do allow in the system? > Yeah, SATA and SAS are standards. Yeah, in theory, they all do work > together. In reality, they really don't, and you have to test. > Everyone does some aspect slightly different and usually in software, so > they can fix it if they messed up. If their is a RAID timeout bug due > to head settling timing, yeah, this is fixable. But if the disk > manufacturer doesn't want to fix it ... its your companies name on the > outside of that box. You are going to take the heat for their problems. > > Note: This isn't just SATA/SAS drives, there are a whole mess of things > that *should* work well together, but do not. We had some exciting > times in the recent past with SAS backplanes that refused to work with > SAS RAID cards. We've had some excitment from 10GbE cards, IB cards, > etc. that we shouldn't have had. > > I can't and won't sanction their tone to you ... they should have > explained things correctly. Given that PERC are rebadged LSI, yeah, I > know perfectly well a whole mess of drives that *do not* work correctly > with them. > > So please don't take Dell to task for trying to help you avoid making > what they consider a bad decision on specific components. There could > be a marketing aspect to it, but support is a cost, and they want to > minimize costs. Look at failure rates, and toss the suppliers who have > very high ones. > > > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf