Okay, so I have just attempted for a second time to upgrade our Cyrus server on the production box from version 2.2.1 to version 2.2.8. The upgrade was a failure and I had to back out and go through a not-quite-painless recovery process on the user accounts affected by the upgrade.
So, the first time I did this we ran into two obvious bugs on the Tru64 5.x platform. The first bug was that after a bit, the processes leaked all their file descriptors and eventually starting crapping out. In a previous e-mail, I described this as a problem with getnameinfo() on Tru64 and we will likely have to work this problem through HP's tech support.
The second bug was tied to some user's sieve scripts failing. In another previous e-mail, I described that bug and supplied a fix.
What was most problematic about that upgrade was that quite a number of users ended up with corrupted cyrus.header and cyrus.index files and had to be reconstructed. The users reported various problems, such as opening their mailboxes and finding no messages in them, even though they knew there were... or finding messages, but when opening a message, it would be blank when they knew it wasn't. Reconstructing the accounts fixed the problems.
So, after fixing the above bugs, we attempted another upgrade tonight. I started seeing a lot of log messages from LMTP to the following effect:
verify_user(user.cyrus) failed: Mailbox has an invalid format
The server ran for nearly 20 minutes. in that time, there were 1717 of these errors affecting 136 different users. I believe that this is related in some way to the mailbox corruption during the previous upgrade.
The archives suggest reconstructing accounts that show this, and maybe we would be okay by continuing to run on the new version and reconstructing all existing accounts, but we have over 93,000 accounts and over 376,000 mailboxes... so, reconstructing everything is not the way to go.
I read the upgrade instructions and the only things mentioned when going from 2.2.1 to 2.2.8 is the bytecode format change (i.e. recompiling all the user sieve scripts) and the backend database stuff is now specified in the imapd.conf file (which I did and double-checked to make sure that what is supplied matches what I had compiled in the 2.2.1 version of the server). However, the cyrus.header and cyrus.index files are not related to any of the backend database stuff, so I don't understand what is going on here.
Does anyone have any ideas on how to troubleshoot this? I guess the first thing I need to do is to duplicate it in our test environment... All the testing I have done never once triggered this problem.
Anyways, I hope somebody can help me on this.
Thanks, Scott -- +-----------------------------------------------------------------------+ Scott W. Adkins http://www.cns.ohiou.edu/~sadkins/ UNIX Systems Engineer mailto:[EMAIL PROTECTED] ICQ 7626282 Work (740)593-9478 Fax (740)593-1944 +-----------------------------------------------------------------------+ PGP Public Key available at http://www.cns.ohiou.edu/~sadkins/pgp/
pgpxXxWWNPgwW.pgp
Description: PGP signature