Hi,
sorry for this long mail ... and my poor english
since the announce of (CAN-2004-1015), we're (slowly/cautiously) upgrading our cyrus-imap servers : everythink went fine went we upgraded cyrus 2.1.15 and 2.2.8 to 2.2.10, but we've got problems upgrading or major server, currently running cyrus imapd 2.1.12.
** What we did :
This server runs Cyrus 2.1.12 + Sasl 2.1.12 + db3 and hosts ~ 3000 users, 140 Go (Raid 5) e-mail (about 20.000.000 mails). We decided "prudently " to migrate its content to another cyrus 2.2.10 server - the "old" 2.1.12 server is *still* untouched.
We backed up its mailspool, mailboxes.db (flat), user.seen & user.sub files to a freshly installed FreeBSD 4 stable + db4 (db41-4.1.25) + sasl 2.1.18 + cyrus 2.2.10.
For it's backend db we use the following settings :
annotation_db: skiplist (unused on 2.1.12) duplicate_db: berkeley-nosync (DB3 on 2.1.12) mboxlist_db: flat (unchanged) ptscache_db: berkeley (unused on 2.1.12) quota_db: flat (unused - no quotas) seenstate_db: flat (unchanged) subscription_db: flat (unchanged) tlscache_db: berkeley-nosync (unused - no tls)
(we use the *same* DB backends on our old 2.1.12 server)
We reconstructed its mailspool twice (su cyrus -c /usr/cyrus/bin/reconstruct -rf user), ran chk_cyrus : flawlessly
** What then happened :
So far, we had no visible problem - the user can happily use their mailboxes, neither seen states or ACL have been lost
*But* when a single user want to modify it subscription (using mozilla/thunderbird: "Files" -> "Subscribe...") the imapd process take "ages" ~ 20s, but worse this imapd eats ~ 80% CPU on a dual Xeon 2.8/1Go !!
About 20s after (even if the user has only 20 mbox), it gives the right list, but we're really freightened when there will be ~ 500 simultaneous users :/
... furthermore, we run exactly the same cyrus imapd binaries on the same hardware (Dell Pe2650), OS (FreeBSD 4 stable) on another server (succesfully upgraded from 2.2.8 to 2.2.10) without any problem - the subscribe dialog appears without delay/ CPu "plateau"
We ktraced the imapd process on the 2 servers without any diffs (minus delays)
** So far our conclusions :
. it's not an I/O issue - no activity on the dedicated raid/AHC 39160/partition - quite dead iostat stats
. neither ctl_cyrusdb -r nor chk_cyrus complain - no "suspicious" log
. it's not likely a user.sub DB problem - we tried converting user.sub to skiplist, DB and even recreated them without any success
---------------- sample dialog (mbox names obscured/removed) : "27 lsub "" "INBOX.*"\r (snip) 27 OK Completed (0.000 secs 26 calls)\r
"28 list "" "INBOX.%"\r (snip) 28 OK Completed (0.008 secs 31 calls)\r
"29 list "" "INBOX.%.%"\r (snip) 29 OK Completed (0.016 secs 18 calls)\r
"30 lsub "" "user.*"\r (snip) "30 OK Completed (0.000 secs 1 calls)\r
"31 list "" "user.%"\r (snip) "31 OK Completed (6.227 secs 1 calls)\r <- ###### THIS ONE
"32 list "" "user.%.%"\r (snip) "32 OK Completed (6.375 secs 1 calls)\r <- ###### THIS ONE " "33 lsub "" "*"\r (snip) 33 OK Completed (0.008 secs 28 calls)\r
"34 list "" "%"\r (snip) "* LIST (\\HasChildren) "." "INBOX"\r * LIST (\\HasChildren) "." "XXX"\r (shared mb) 34 OK Completed (6.305 secs 34 calls)\r <- ###### THIS ONE
"35 list "" "%.%"\r "* LIST (\\HasChildren) "." "INBOX.XXX"\r * LIST (\\HasNoChildren) "." "INBOX.YYY"\r (snip) * LIST (\\HasNoChildren) "." "crip-visio.gdfgdfg"\r (shared mb) 35 OK Completed (6.492 secs 32 calls)\r <- ###### THIS ONE " "36 IDLE\r "36 OK Completed\r " "37 close\r 38 logout\r ----------------
. the ktrace shows (of course) many calls to the mailboxes.db file - apparently, the ( LIST "" "user.%" ) commands take ages to completed (we've got 127000 lines in our mailboxes.db flat file) - but when we run under cyradm a single "listmailboxes %" or "listmailboxes %.%" it completes at normal speed...
=-=-=-=
Could any "gurus" out there enlighten us : we're running out of candle for our voodoo cults... and of course (thanx Mr. Murphy) we've got to migrate quickly - our of campus imap acces is blocked since Wed. 25/11
If it is mailboxes.db related (???) would a single reconstruct -rf from an empty mbox.db file help (but we ran it twice - there were no diffs) ??
Thanks for your patience for this long mail,
best regards
Gilles BRUNO System Admin University Joseph Fourier - France --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html