zfs mirrors and high availability

Michael Boers Thu, 11 Nov 2010 10:36:00 -0800

I am running a 100% zfs based FreeBSD 8.0 system with 4 disks: two zfsmirrored boot drives and two zfs mirrored data drives. This morningthe server went down with the following errors in the log file:

Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZECACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSIStatus ErrorNov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: CheckConditionNov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc:0,0Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional senseinformation

Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted

Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838timed out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839timed out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req0xffffff80003c87a0:2838 function 0Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840timed out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841timed out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842timed out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843timed out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844timed out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845timed out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846timed out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847timed out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)

Why didn't zfs stop talking to the disk that was clearly havingissues? Are there sysctl or other variables that I can set that willallow zfs to mark a disk as failed more aggressively? Is there a waythat I could have prevented the crash?

The system was "up", pingable, but not accessible via ssh. My guessis that all disk related requests were queueing/stuck.


A few more notes on my setup:

Harware: Dell PowerEdge 2970, 1 CPU, 16 GB Ram

  pool: Storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        Storage     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da3     ONLINE       0     0     0

errors: No known data errors

  pool: zboot
 state: ONLINE
 scrub: scrub in progress for 0h22m, 72.03% done, 0h8m to go
config:

        NAME           STATE     READ WRITE CKSUM
        zboot          ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0

--
Thanks!

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[email protected]"

zfs mirrors and high availability

Reply via email to