Re: upgrade to cyrus 2.0.16 causes high load average

Robert Scussel Tue, 08 Jan 2002 09:14:36 -0800

We have experienced the same problem with cyrus/IMP regarding the
lockers 4 days ago. We solved the problem by restarting Cyrus, and
haven't seen it since.  I think the problem may lie, actually, in lmtp,
because our sessions continued to hang with the locking error until we
stopped and restarted Cyrus.


We have not found a solution, and have not seen this problem since. 
However, if someone finds the cause/solution, we would be interested.

We are running on linux 2.4.14-xfs, cyrus-2.1.0, IMP 3.0/Horde
2.0/Apache 1.3.22

Hope this helps.

B
Chris Peck wrote:
> 
> We have noticed a serious load-avg issue after upgrading to cyrus-2.0.16.
> 
> We have just upgraded to cyrus-2.0.16 over the weekend, we went from
> Simeon/Messaging Direct (cyrus-1.x version) under Solaris 2.6 to
> cyrus-2.0.16 under Solaris 8 (kernel patch 108258-10).  We tested it as
> much as possible for about 2 months before transferring all of our
> mailstore over (around 18000 users).  We were able to test all of our
> e-mail clients except for our webmail client (IMP-2.x).  I must say that
> the new version/system seemed to be great (and other than the following
> issue, I still think it's great, especially the sieve capability).
> anyways, here's our issue:
> 
> We brought the system back up Sunday afternoon, and up until Monday
> afternoon everything ran just fine (when we "fixed" IMP). We had an
> issue with IMP not logging in correctly, but fixed that by flushing the
> session info IMP uses from the mysql database it stored it in.  Once IMP
> was up & being used we started to notice the following:
> 
> Our normal load average is betweem 0.7 & 2 (around 2 when we get batches
> of email in). We noticed the loadavg rising at a tremendous rate, soon
> it was at 20, 30, 40, all the way up to 65! What was interesting was
> that response time for me while logged in to ksh was fast, as were
> connects to pop & imap. Sendmail, of course turned off receives (set to
> stop at 8). After about an hour we decided to disable IMP (run's on a
> separate box), and the load dropped immediately to under 1.  Here's some
> info I managed to collect during this scary period:
> 
> System  Info:
> cyrus-2.0.16
> cyrus-sasl-1.5.24
> db-3.3.11
> sendmail-8.12.1
> 
> # cat config.status (for cyrus-imapd)
> #! /bin/sh
> # Generated automatically by configure.
> # Run this file to recreate the current configuration.
> # This directory was configured as follows,
> # on host mail1:
> #
> # ./configure  --with-dbdir=/usr/local/BerkeleyDB.3.3 --with-
> perl=/usr/local/bin/perl --with-openssl=/usr/local/ssl --with-
> tcl=/usr/local --with-libwrap=/usr/local --with-sasl=/usr/local/lib
> --with-auth=unix --with-idle=poll
> #
> 
> Here's some info from the running system during hi-load time:
> 
> top showed:
> last pid:   933;  load averages: 34.75, 30.96,
> 29.70                                                           15:17:59
> 185 processes: 131 sleeping, 50 running, 4 on cpu
> CPU states:  4.7% idle, 83.6% user, 11.2% kernel,  0.5% iowait,  0.0%
> swap
> Memory: 4096M real, 3234M free, 180M swap in use, 3974M swap free
> 
>     PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>     785 cyrus      1  55    0   24M 3824K run      0:19  1.86% imapd
>     734 cyrus      1  56    0   24M 3768K run      0:18  1.84% imapd
> (many more imapd's, pop3d's running...):
> 
> root@mail1 # truss -p 785
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351F38)                      = 0
> lwp_mutex_wakeup(0xFF351F38)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351F38)                      = 0
> lwp_mutex_wakeup(0xFF351F38)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
> 
> This went on "forever", until I stopped "master".
> 
> We also noticed the following a little earlier (during the hi-load time)
> from local6.error:
> Jan  7 14:54:33 mail1 imapd[6]: [ID 866726 local6.error] DBERROR db3: 74
> lockersJan  7 14:54:34 mail1 imapd[30]: [ID 866726 local6.error] DBERROR
> db3: 76 lockers
> Jan  7 14:54:39 mail1 imapd[4]: [ID 866726 local6.error] DBERROR db3:
> Unable to allocate 8287 bytes from mpool shared region: Not enough space
> Jan  7 14:54:39 mail1 imapd[4]: [ID 598274 local6.error] DBERROR: error
> advancing: Not enough space
> Jan  7 14:54:40 mail1 imapd[29918]: [ID 866726 local6.error] DBERROR
> db3: Unable to allocate 8287 bytes from mpool shared region: Not enough
> space
> Jan  7 14:54:40 mail1 imapd[29918]: [ID 598274 local6.error] DBERROR:
> error advancing: Not enough space
> Jan  7 14:54:40 mail1 imapd[153]: [ID 866726 local6.error] DBERROR db3:
> Unable to allocate 8287 bytes from mpool shared region: Not enough space
> Jan  7 14:54:40 mail1 imapd[153]: [ID 598274 local6.error] DBERROR:
> error advancing: Not enough space
> Jan  7 14:54:51 mail1 imapd[151]: [ID 866726 local6.error] DBERROR db3:
> 86 lockers
> 
> lots of the above "DBERROR db3: xx lockers" lines
> 
> Jan  7 15:07:13 mail1 imapd[454]: [ID 898233 local6.error] DBERROR:
> error closing: DB_INCOMPLETE: Cache flush
> Jan  7 15:07:13 mail1 imapd[454]: [ID 179994 local6.error] DBERROR:
> error closing mailboxes: cyrusdb error
> 
> It seems that IMP was able to tickle something on the system, realize
> that IMP does a login/logout for every IMAP transaction, so it is an
> expensive webmail client. However, it shouldn't be able to put cyrus
> into such a weird state (it is just a client).
> 
> Any suggestions would be helpful....
> 
> Thanks,
> chris
> 
> Chris Peck
> Manager of Unix Services
> Information Technology
> The College of William and Mary
> Williamsburg, VA 23187

-- 
Robert Scussel
1024D/BAF70959/0036 B19E 86CE 181D 0912  5FCC 92D8 1EA1 BAF7 0959

Re: upgrade to cyrus 2.0.16 causes high load average

Reply via email to