Re: [OpenIndiana-discuss] Sudden ZFS performance issue

Lucas Van Tol Fri, 05 Jul 2013 11:29:45 -0700

Have you tried looking with 'latencytop'?
pkg install diagnostic/latencytop
latencytop
and arrow over to the zpool process for your data pool?


That may help you track down what is specifically slowing you down; for example 
if you see something about space map loading, you need
echo 'metaslab_debug/W1' | mdb -kw
to maintain the space maps in memory; high ZFS ZIL writer means more SSD / 
write logs would be good, etc.

If you are using any 'dedup' ; you may have also just hit whatever limits for 
in-memory de-dup tables (although I dunno what that looks like in latency 
top...).


'iostat -xnz 1' and looking for specific drives with notably longer asvc_t and 
/or %w than others is another easy check with sudden performance drops 
(suggests the system is slowing down for a single bad/dying disk).

-Lucas Van Tol

> Date: Fri, 5 Jul 2013 20:09:45 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [OpenIndiana-discuss] Sudden ZFS performance issue
> 
> On Fri, Jul 5, 2013 at 8:00 PM, Saso Kiselkov <[email protected]> wrote:
> > On 05/07/2013 17:08, [email protected] wrote:
> >> Good morning,
> >>
> >> I have a weird problem with two of the 15+ OpenSolaris storage servers in 
> >> our
> >> environment. All the Nearline servers are essentially the same. Supermicro
> >> X9DR3-F based server, Dual E5-2609's, 64GB memory, Dual 10Gb SFP+ NICs, LSI
> >> 9200-8e HBA, Supermicro CSE-826E26-R1200LPB storage arrays and Seagate
> >> enterprise 2TB SATA or SAS drives (not mixed within a server). Root, l2ARC 
> >> and
> >> ZIL are all on Intel SSD (SLC series 313 for ZIL, MLC 520 for L2ARC and 
> >> MLC 330
> >> for boot)
> >>
> >> The volumes are built out of 9 drive Z1 groups, ashift is set to 9 (which 
> >> is
> >> supposed to appropiate for the enterprise seagates). The pools are large
> >> (120-130TB) but are only between 27 and 32% full. Each server serves an 
> >> iSCSI
> >> (Comstar) and an CIFS (in kernel server) volume of the same pool. I 
> >> realize this
> >> is not optimal from a recovery/resilver/rebuild standpoint but the servers 
> >> are
> >> replicated and the data is easily rebuildable.
> >>
> >> Initially these servers did great for several months, while certainly no 
> >> speed
> >> demons, 300+ MB/sec for sequential read/writes was not a problem. Several 
> >> weeks
> >> ago, literally overnight, replication times went through the roof for one
> >> server. Simple testing showed that reading from the pool would no longer 
> >> go over
> >> 25MB/s. Even a scrub that used to run at 400+ MB/sec is now crawling along 
> >> at
> >> below 40MB/s.
> >>
> >> Sometime yesterday the second server started to exhibit the exact same
> >> behaviour. This one is used even less (it's our D2D2T server) and data is
> >> written to it at night and read during the day to be written to tape.
> >>
> >> I've exhausted all I know and I'm at a loss. Does anyone have any ideas of 
> >> what
> >> to look at, or do any obvious reasons for this behaviour jump out from the
> >> configuration above?
> >
> > Isn't iostat -Exn reporting some transport errors? Smells like a drive
> > gone bad and forcing retries, which would cause about a 10x decrease in
> > performance. Just a guess, though.
> 
> Why should a retry require a 10x decrease in performance? A proper
> design would surely do retries in parallel to other operations
> (Reiser4 and btrfs do it) up to a certain amount of
> failures-in-flight.
> 
> Irek
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> [email protected]
> http://openindiana.org/mailman/listinfo/openindiana-discuss
                                          
_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Sudden ZFS performance issue

Reply via email to