On 03/11/17 11:25 PM, Jeff Woolsey wrote:
# uname -a
SunOS bombast 5.11 illumos-2816291 i86pc i386 i86pc
# cat /etc/release
              OpenIndiana Hipster 2016.10 (powered by illumos)
         OpenIndiana Project, part of The Illumos Foundation (C) 2010-2016
                         Use is subject to license terms.
                            Assembled 30 October 2016
# # zpool status cloaking
   pool: cloaking
  state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
   scan: resilver in progress since Sat Mar 11 11:42:37 2017
     8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)
     5.31M resilvered, 0.00% done
config:

         NAME                     STATE     READ WRITE CKSUM
         cloaking                 ONLINE      97     0     0
           mirror-0               ONLINE     582     0     0
             c5d0s6               ONLINE       0     0   582  (resilvering)
             8647373200783277078  UNAVAIL      0     0     0  was 
/dev/dsk/c5d0s3

errors: 120 data errors, use '-v' for a list

Not an expert (should really ask OpenZFS people what to do next),
yet seems like you do have disk died/unavailable and even after reboot, it would continue doing the operation that started. "120 data errors" looks like you DO have some data errors form disk itself and also you do have 582 checksum errors in transfer from/to disk.

I hope you have Backups elsewhere, I hope you are not using SATA disks on SAS to SATA expanders (unreliable), I hope you are not using SATA disks on SAS controller (not recommended), I hope you are using ECC RAM (must have if valuing data). Also it seems you have done some weird thing.., adding 2 disk slices on the SAME disk to a mirror.. What is the point of that, when you can always set 'zfs set copies=2' for any dataset to get duplicated data copies on same pool. anyway?

8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)

When I start zpool scrub, it starts slowly but later it does speed up.
I would recommend turning machine off, booting from some live USB/DVD media and dump with dd (disk dump) _Everything_ on that disk/working partition/slice elsewhere (on image, device) for safekeeping, in case other disk dies too. Then I would need to wait him to finish resilvering and then add to it another device/slice to resilver on it again (Zpool copies only used space, so it's faster). That way you can continue working and then remove other devices in the pool and re-add one more new device to you again have healthy 2 disk mirror as minimum. So you would need 2 working disks/slices on separate physical disk devices to survive this minimum.

For crucial data, you can also think in the future about 3-disk mirror (or raidz2 for better disk usage), that would keep you afloat if even 2 disk dies out of 3. Also have to check that machine hardware (Ecc,sas/sata,expanders,using whole disks instead of slices etc.) and surely, do replicate (zfs send) data elsewhere, but don't mistake replication for offline backups.

# zpool reopen cloaking
cannot reopen 'cloaking': pool I/O is currently suspended
# zpool detach cloaking /dev/dsk/c5d0s3
cannot detach /dev/dsk/c5d0s3: pool I/O is currently suspended
# zpool detach cloaking 8647373200783277078
cannot detach 8647373200783277078: pool I/O is currently suspended
# zpool detach cloaking randomtrash
cannot detach randomtrash: no such device in pool
#

How can I get rid of the UNAVAIL disk slice so that this pool doesn't
try to resilver (From what, pray tell?) all the time.  I don't know
where that ugly number came from--this system only has SATA disks.  I
have a new mirror slice just waiting for it as soon as it stops doing
this.  zpool clear  just hangs. Meanwhile, despite its assertions of
ONLINE,

# zfs list -r cloaking
cannot open 'cloaking': pool I/O is currently suspended
# zpool remove cloaking 8647373200783277078
cannot remove 8647373200783277078: only inactive hot spares, cache, top-level, 
or log devices can be removed
# zpool offline cloaking 8647373200783277078
cannot offline 8647373200783277078: pool I/O is currently suspended
#

I'm of the opinion that the data is mostly intact (unless zpool has been
tricked into resilvering a data disk from a blank one (horrors)).

# zpool export cloaking

hangs.



_______________________________________________
openindiana-discuss mailing list
[email protected]
https://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to