Re: upgrade to cyrus 2.0.16 causes high load average

Connie S. Fensky Tue, 08 Jan 2002 09:43:45 -0800

Chris,
    You'll probably get this twice, since I also replied to the list. Sorry,
but I didn't want you to miss it.
    We had to switch back to flat files for our mailboxes db--we got so many
DBERRORs, that our entire mail server effectively shut down every couple of
hours, and I had to kill the master process and restart. That is not an
option here. I am at a great disadvantage with my HP-UX system, since I do
not have the required syntax for mmap, so am not using it in cyrus. The
incredible cpu load was evident even before we brought up our webmail
client, so I suspect that had something to do with it.
Unfortunately, even though we've gone back to flat file for mailboxes (we
have well over 20,000 users), we are still seeing what we consider unusually
high cpu load at all times, and spikes up to 60% or better VERY often. And,
we are still getting some DBERRORs from the duplicate database. The spikes
seem mostly cpu, though there are some I/O spikes, too.
Our whole mail system is on fibre channel partitions--three mail partitions,
and the /var/imap and /var/spool/mqueue ones. Those are the only things on
our fibre channel drives. In addition, I cannot delete mail accounts!  I am
very disturbed by this, and cannot figure out exactly why (see bt
information for a hung imapd process I caught).
Attaching to program: /usr/cyrus/bin/imapd, process 27494
Reading symbols from /opt/cyrus-sasl-1.5.24/lib/libsasl.sl.8...done.
Reading symbols from /usr/local2/db-3.3.11/lib/libdb-3.3.sl...done.
Reading symbols from /usr/lib/libpthread.1...done.
Reading symbols from /usr/lib/libc.2...done.
Reading symbols from /usr/lib/libdld.2...done.
Reading symbols from /usr/lib/sasl/libanonymous.sl...done.
Reading symbols from /usr/lib/sasl/libplain.sl...done.
Reading symbols from /usr/lib/libnss_dns.1...done.
Reading symbols from /usr/lib/libnss_files.1...done.


warning: reading `r3' register: No data
0xc01f4c40 in _read_sys () from /usr/lib/libc.2
(gdb) bt
#0  0xc01f4c40 in _read_sys () from /usr/lib/libc.2
#1  0xc0200bf4 in read () from /usr/lib/libc.2
#2  0x428f0 in map_refresh () at map_nommap.c:96
#3  0x438bc in starttxn_or_refetch () at cyrusdb_flat.c:252
#4  0x439bc in myfetch () at cyrusdb_flat.c:270
#5  0x43ab0 in fetch () at cyrusdb_flat.c:291
#6  0x25394 in mboxlist_mylookup () at mboxlist.c:176
#7  0x25554 in mboxlist_lookup () at mboxlist.c:252
#4  0x439bc in myfetch () at cyrusdb_flat.c:270
#5  0x43ab0 in fetch () at cyrusdb_flat.c:291
#6  0x25394 in mboxlist_mylookup () at mboxlist.c:176
#7  0x25554 in mboxlist_lookup () at mboxlist.c:252
#8  0x31c40 in user_deleteacl () at user.c:88
#9  0x277d8 in find_cb () at mboxlist.c:1360
#10 0x43c6c in foreach () at cyrusdb_flat.c:348
#11 0x27b00 in mboxlist_findall () at mboxlist.c:1488
#12 0x31d90 in user_delete () at user.c:127
#13 0xf7f0 in cmd_delete () at imapd.c:3382
#14 0x83e4 in cmdloop () at imapd.c:830
#15 0x7234 in service_main () at imapd.c:546
#16 0x5288 in main () at service.c:285

So, I have not ironed out our cyrus yet, and I appreciate your interest.
Hopefully, we can help each other.
c*

----- Original Message -----
From: "Chris Peck" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, January 08, 2002 10:27 AM
Subject: upgrade to cyrus 2.0.16 causes high load average


> We have noticed a serious load-avg issue after upgrading to cyrus-2.0.16.
>
> We have just upgraded to cyrus-2.0.16 over the weekend, we went from
> Simeon/Messaging Direct (cyrus-1.x version) under Solaris 2.6 to
> cyrus-2.0.16 under Solaris 8 (kernel patch 108258-10).  We tested it as
> much as possible for about 2 months before transferring all of our
> mailstore over (around 18000 users).  We were able to test all of our
> e-mail clients except for our webmail client (IMP-2.x).  I must say that
> the new version/system seemed to be great (and other than the following
> issue, I still think it's great, especially the sieve capability).
> anyways, here's our issue:
>
> We brought the system back up Sunday afternoon, and up until Monday
> afternoon everything ran just fine (when we "fixed" IMP). We had an
> issue with IMP not logging in correctly, but fixed that by flushing the
> session info IMP uses from the mysql database it stored it in.  Once IMP
> was up & being used we started to notice the following:
>
> Our normal load average is betweem 0.7 & 2 (around 2 when we get batches
> of email in). We noticed the loadavg rising at a tremendous rate, soon
> it was at 20, 30, 40, all the way up to 65! What was interesting was
> that response time for me while logged in to ksh was fast, as were
> connects to pop & imap. Sendmail, of course turned off receives (set to
> stop at 8). After about an hour we decided to disable IMP (run's on a
> separate box), and the load dropped immediately to under 1.  Here's some
> info I managed to collect during this scary period:
>
> System  Info:
> cyrus-2.0.16
> cyrus-sasl-1.5.24
> db-3.3.11
> sendmail-8.12.1
>
> # cat config.status (for cyrus-imapd)
> #! /bin/sh
> # Generated automatically by configure.
> # Run this file to recreate the current configuration.
> # This directory was configured as follows,
> # on host mail1:
> #
> # ./configure  --with-dbdir=/usr/local/BerkeleyDB.3.3 --with-
> perl=/usr/local/bin/perl --with-openssl=/usr/local/ssl --with-
> tcl=/usr/local --with-libwrap=/usr/local --with-sasl=/usr/local/lib
> --with-auth=unix --with-idle=poll
> #
>
>
> Here's some info from the running system during hi-load time:
>
> top showed:
> last pid:   933;  load averages: 34.75, 30.96,
> 29.70                                                           15:17:59
> 185 processes: 131 sleeping, 50 running, 4 on cpu
> CPU states:  4.7% idle, 83.6% user, 11.2% kernel,  0.5% iowait,  0.0%
> swap
> Memory: 4096M real, 3234M free, 180M swap in use, 3974M swap free
>
>     PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>     785 cyrus      1  55    0   24M 3824K run      0:19  1.86% imapd
>     734 cyrus      1  56    0   24M 3768K run      0:18  1.84% imapd
> (many more imapd's, pop3d's running...):
>
> root@mail1 # truss -p 785
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351F38)                      = 0
> lwp_mutex_wakeup(0xFF351F38)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351E68)                      = 0
> lwp_mutex_wakeup(0xFF351E68)                    = 0
> lwp_mutex_lock(0xFF351F38)                      = 0
> lwp_mutex_wakeup(0xFF351F38)                    = 0
> pread64(10, "\0\0\001\0 z10 D\0\0\0BF".., 8192, 1564672) = 8192
>
> This went on "forever", until I stopped "master".
>
> We also noticed the following a little earlier (during the hi-load time)
> from local6.error:
> Jan  7 14:54:33 mail1 imapd[6]: [ID 866726 local6.error] DBERROR db3: 74
> lockersJan  7 14:54:34 mail1 imapd[30]: [ID 866726 local6.error] DBERROR
> db3: 76 lockers
> Jan  7 14:54:39 mail1 imapd[4]: [ID 866726 local6.error] DBERROR db3:
> Unable to allocate 8287 bytes from mpool shared region: Not enough space
> Jan  7 14:54:39 mail1 imapd[4]: [ID 598274 local6.error] DBERROR: error
> advancing: Not enough space
> Jan  7 14:54:40 mail1 imapd[29918]: [ID 866726 local6.error] DBERROR
> db3: Unable to allocate 8287 bytes from mpool shared region: Not enough
> space
> Jan  7 14:54:40 mail1 imapd[29918]: [ID 598274 local6.error] DBERROR:
> error advancing: Not enough space
> Jan  7 14:54:40 mail1 imapd[153]: [ID 866726 local6.error] DBERROR db3:
> Unable to allocate 8287 bytes from mpool shared region: Not enough space
> Jan  7 14:54:40 mail1 imapd[153]: [ID 598274 local6.error] DBERROR:
> error advancing: Not enough space
> Jan  7 14:54:51 mail1 imapd[151]: [ID 866726 local6.error] DBERROR db3:
> 86 lockers
>
> lots of the above "DBERROR db3: xx lockers" lines
>
> Jan  7 15:07:13 mail1 imapd[454]: [ID 898233 local6.error] DBERROR:
> error closing: DB_INCOMPLETE: Cache flush
> Jan  7 15:07:13 mail1 imapd[454]: [ID 179994 local6.error] DBERROR:
> error closing mailboxes: cyrusdb error
>
> It seems that IMP was able to tickle something on the system, realize
> that IMP does a login/logout for every IMAP transaction, so it is an
> expensive webmail client. However, it shouldn't be able to put cyrus
> into such a weird state (it is just a client).
>
> Any suggestions would be helpful....
>
> Thanks,
> chris
>
>
> Chris Peck
> Manager of Unix Services
> Information Technology
> The College of William and Mary
> Williamsburg, VA 23187
>
>
>

Re: upgrade to cyrus 2.0.16 causes high load average

Reply via email to