[OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?

Lou Picciano Mon, 10 Dec 2018 08:11:39 -0800

Really need some feedback from The Experts here…

We have a root pool which has started to run very slowly…


Evidence? 
- originally, only indication was that there seemed to be nearly-continuous 
drive controller traffic. (the pool is nowhere near full…)
- scrub pool has taken about 5 days to scrub only a few hundred GB of this 2TB 
pool (good news, however, is that no errors are found. Can this be trusted?)
  (at that rate, it would take at least another week to finish the scrub…)

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c2t0d0s0  ONLINE       0     0     0
            c2t1d0s0  ONLINE       0     0     0

Boot process has become excrutiatingly slow, and worrisome. Immediately after 
Loading OS message, we get hundreds of messages like:

        disk0: Read 8 sector(s) from <really_long address?> to 0xffffe000 
(ox8000): 0x1

Is this evidence of erroneous attempts to read boot blocks/loader on disk0?

Given the machine BIOS identification of drives, dunno that I can be absolutely 
certain disk0 is referring to ‘one’ disk - or is the entire rpool seen as disk0 
once the OS is loading?

Machine does eventually boot, however - takes about 20 mins! Recent Hipster 
updates (2018-11-27) have been applied. System otherwise runs quite well. Most 
client data is on datapool; they remain oblivious. (To be honest, they were 
oblivious before this…(!) )

$ iostat -D 1 5
   backup       datapool       rpool          sd1       
rps wps util  rps wps util  rps wps util  rps wps util  
  2   3  1.3    8 131  9.3   87  89 99.6   44  45 73.0  
  0   0  0.0    0 645 18.3  105   0 99.9   52   0 71.2  
  0   0  0.0    0  16  0.1   98 107 100.0   52  54 91.0  
  0   0  0.0    0   0  0.0   31 373 100.0   13 190 95.0  
  0   0  0.0    0   0  0.0   54  14 100.0   24  12 48.8

Some specific questions:

1) How can I definitively diagnose which of the pool disks is the bad one? 
Seems obvious, but is it?

2) Is this a matter of corrupted boot blocks on one drive, being compensated 
for by ‘good blocks’ on the other?

3) These are SATA disks; I am about to try the ‘hot swap’ in situ approach; is 
it safe to do this with questionable boot blocks?:

# zpool offline c2t0d0s0
# cfgadm unconfigure sata0/0::dsk/c2t0d0
 — swap — 
# cfgadm configure sata0/0::dsk/c2t0d0
# zpool online rpool c2t0d0s0
# zpool replace pool c2t0d0s0

Tks for any insights,

Lou Picciano
_______________________________________________
openindiana-discuss mailing list
[email protected]
https://openindiana.org/mailman/listinfo/openindiana-discuss

[OpenIndiana-discuss] Huge ZFS root pool slowdown - diagnose root cause?

Reply via email to