Greetings all,

With the help of this list, I've successfully upgraded our lab 2.3.7 
(RHEL/CentOS packaged) server to 2.3.16-8 and tested rolling replication, 
manual replication by user, and manual replication by mailbox.  Everything was 
going better than expected until I shut down cyrus-imapd and /var/log/maillog 
started filling up with DB errors.

If I shut down cyrus-imapd with rolling replication enabled and have not run 
sync_client manually, both Cyrus and sync_client shut down cleanly.

However, if I have run sync_client manually while rolling replication is 
enabled the rolling replication instance will not exit.  Instead, it appears to 
start spawning subprocesses and throwing database errors.  The change in 
database errors (below) appears to coincide with the completion of "Exporting 
cyrus-imapd databases".  The critical DB error messages continue until 
sync_client is killed.

I've run "ctl_cyrusdb -r" as suggested by the "run recovery" message.

Below are the steps that reproduce the problem, /var/log/maillog, the most 
relevant portions of imapd.conf and cyrus.conf, and the packages installed on 
the system.  cyrus-imapd-2.3.16-8 was built with "rpmbuild -ba" on CentOS 5.4 
64-bit using 
http://www.invoca.ch/pub/packages/cyrus-imapd/cyrus-imapd-2.3.16-8.src.rpm.  
The cyrus-sasl and db4 packages are from CentOS.  Please let me know if any 
other information would be useful.

Thank you for your help.

Best regards,

John


# /usr/lib/cyrus-imapd/sync_client -v -u testu...@testdomain.net
USER testu...@testdomain.net
ADDSUB testu...@testdomain.net INBOX
# date ; service cyrus-imapd stop
Fri Oct 15 14:51:34 EDT 2010
Shutting down cyrus-imapd:                                 [  OK  ]
Exporting cyrus-imapd databases:                           [  OK  ]

Oct 15 14:50:58 eml-store04 sync_client[23742]: USER received NO response: 
IMAP_MAILBOX_NONEXISTENT Failed to access inbox for testu...@testdomain.net: 
Mailbox does not exist

  NOTE: Despite this message, the user appears identical on the 
  master and replica when checked with ctl_mboxlist -d.

Oct 15 14:51:35 eml-store04 master[22922]: attempting clean shutdown on SIGQUIT
Oct 15 14:51:35 eml-store04 master[22922]: process 22950 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22949 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22948 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22947 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22946 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22945 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22944 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22943 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22939 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22938 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22937 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22936 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22935 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22934 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22933 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22932 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: process 22931 exited, status 75
Oct 15 14:51:35 eml-store04 master[22922]: All children have exited, closing 
down
Oct 15 14:51:35 eml-store04 sync_client[23914]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23916]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23919]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23925]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:35 eml-store04 sync_client[23929]: DBERROR db4: region 1 
(environment): reference count went negative
... many more ...
Oct 15 14:51:41 eml-store04 sync_client[25331]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25332]: DBERROR db4: region 1 
(environment): reference count went negative
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR db4: PANIC: fatal 
region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25333]: DBERROR: critical database 
situation
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR db4: PANIC: fatal 
region error detected; run recovery
Oct 15 14:51:41 eml-store04 sync_client[25353]: DBERROR: critical database 
situation
... continue until sync_client is killed ...


>From /etc/cyrus.conf:
START {
  # do not delete this entry!
  recover       cmd="ctl_cyrusdb -r"
  # this is only necessary if using idled for IMAP IDLE
  idled         cmd="idled"
  syncclient    cmd="/usr/lib/cyrus-imapd/sync_client -r -o" listen="csync"
}

>From /etc/imapd.conf
  ## Added for replication -- Master
  sync_host: eml-replica04.asddev.reyrey.com
  sync_authname: xyz
  sync_password: abc
  sync_compress: 0
  sync_log: 1
  guid_mode: sha1

Packages installed:
  cyrus-imapd-2.3.16-8
  cyrus-imapd-utils-2.3.16-8
  cyrus-sasl-2.1.22-5.el5_4.3
  cyrus-sasl-lib-2.1.22-5.el5_4.3
  cyrus-sasl-lib-2.1.22-5.el5_4.3
  cyrus-sasl-plain-2.1.22-5.el5_4.3
  cyrus-sasl-plain-2.1.22-5.el5_4.3
  db4-4.3.29-10.el5_5.2
  db4-4.3.29-10.el5_5.2
  db4-utils-4.3.29-10.el5_5.2
  postfix-2.3.3-2.1.el5_2


John Simpson 
Senior Software Engineer, I. T. Engineering and Operations

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Reply via email to