Martin, you can set a timeout with lsiutil, but I've found that it makes no difference, if a consumer grade disk starts trying to read a failing sector it can block a pool indefinitely.
Best regards. Maurilio. Martin Frost wrote: > > From: Jason Matthews <[email protected]> > > Date: Tue, 10 Jan 2012 08:26:08 -0800 > > > > > > you can adjust the disk timeouts in solaris. > > Here's an article on how to do that, although it ends with the author > adding this comment "However in testing with failing harddrives (on > mpt_sas anyway), we see that the sd timeouts are completely ignored so > my entire post above is moot!" > > > http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ > > I haven't tested this, so does it work or not (in OpenIndiana)? > > Martin > > > there are two schools of thought here: > > > > 1) accomodate the extremely long timeouts of cinsumer drives and > > let the drive decide whether to report an error back (fail itself > > out) > > > > 2) set the time outs very narrowly and be aggressive in letting zfs > > fail out disks. > > > > i generally go with option 2. > > > > Sent from Jasons' hand held > > > > On Jan 10, 2012, at 7:13 AM, Maurilio Longo <[email protected]> > wrote: > > > > > Geoff, > > > > > > I've hit this problem several times in the past, with OpenSolaris > > > and then with OpenIndiana. > > > > > > There are, to my knowledge, no available solutions, it is so by > > > design! > > > > > > If a disk stops responding the pool waits until after it responds > > > again (sometimes pulling it out of its slot and then reinserting > > > the disk causes a reset of the link and it starts working again). > > > > > > I was not able to assess what happens if I set failmode to continue. > > > > > > I think it could be no better since you still cannot write to the pool. > > > > > > This is IMHO the biggest problem of ZFS, in that I cannot > > > instruct it to stop using a failed device if it has some level of > > > redundancy still available. > > > > > > Wait is OK only if an entire vdev stops responding, not if a disk > > > in a vdev with redundancy has problems either fatal or > > > transitory. > > > > > > Best regards. > > > > > > Maurilio. > > > > > > > > > PS. Using server grade disks (those with TLER) makes it possibile > > > to overcome this problem for transitory errors. > > > > > > > > > Geoff Nordli wrote: > > > > > >> Part of my concern is why one disk would have completely brought > > >> down the system. I have seen this come up on the list before, > > >> but I don't remember any resolutions to fixing it. > > >> > > >> Anyone have any clues to try to prevent this from happening in > > >> the future? > > >> > > >> thanks, > > >> > > >> Geoff > > _______________________________________________ > OpenIndiana-discuss mailing list > [email protected] > http://openindiana.org/mailman/listinfo/openindiana-discuss > -- __________ | | | |__| Maurilio Longo |_|_|_|____| farmaconsult s.r.l. _______________________________________________ OpenIndiana-discuss mailing list [email protected] http://openindiana.org/mailman/listinfo/openindiana-discuss
