Hi, we are running the following setup under Red Hat Linux Advanced Server 3:
name : Cyrus IMAPD version : v2.2.12-Invoca-RPM-2.2.12-1.ZAIK 2005/02/14 16:43:51 vendor : Project Cyrus support-url: http://asg.web.cmu.edu/cyrus os : Linux os-version : 2.4.21-27.0.2.ELsmp environment: Built w/Cyrus SASL 2.1.20 Running w/Cyrus SASL 2.1.20Built w/Sleepycat Software: Berkeley DB 4.1.25: (August 21, 2003) Running w/Sleepycat Software: Berkeley DB 4.1.25: (August 21, 2003)
Built w/OpenSSL 0.9.7a Feb 19 2003 Running w/OpenSSL 0.9.7a Feb 19 2003 CMU Sieve 2.2 TCP Wrappers mmap = shared lock = fcntl nonblock = fcntl auth = unix idle = idledIn earlier versions of Cyrus we experienced problems where processes got stuck and caused subsequent connections to mailboxes to fail due to lock contention. Some work was done to solve this, but I wonder if the success is only cosmetic. It seems to me as if processes still get stuck, it just doesn't keep new connections from working.
I noticed that our server has an ever increasing number of processes. I'm attaching a screenshot of the relevant Ganglia graph for the last month. I see that there are many imapd and pop3d processes that have been running for a long time, i.e. since the middle of May:
[EMAIL PROTECTED] root]# ps -aef|grep pop3 cyrus 1588 22788 0 May13 ? 00:00:03 pop3d -s cyrus 2810 22788 0 May13 ? 00:00:01 pop3d -s cyrus 32464 22788 0 May13 ? 00:00:02 pop3d -s cyrus 7941 22788 0 May13 ? 00:00:00 pop3d -s cyrus 5331 22788 0 May14 ? 00:00:02 pop3d -s cyrus 4319 22788 0 May14 ? 00:00:02 pop3d -s cyrus 9054 22788 0 May14 ? 00:00:00 pop3d -s cyrus 25309 22788 0 May14 ? 00:00:00 pop3d -s cyrus 8176 22788 0 May14 ? 00:00:02 pop3d -s cyrus 21482 22788 0 May14 ? 00:00:00 pop3d ...All of them seem to be stuck somewhere in SSL, but ultimately in __read_nocancel (). I'll give two examples.
PID 1588: (gdb) where #0 0x006d1f0e in __read_nocancel () from /lib/tls/libc.so.6 #1 0x00c16427 in BIO_new_socket () from /lib/libcrypto.so.4 #2 0x00c143e2 in BIO_read () from /lib/libcrypto.so.4 #3 0x007b4c30 in ssl3_alert_code () from /lib/libssl.so.4 #4 0x007b4dcc in ssl3_alert_code () from /lib/libssl.so.4 #5 0x007b60cf in ssl3_read_bytes () from /lib/libssl.so.4 #6 0x007b6ffc in ssl3_get_message () from /lib/libssl.so.4 #7 0x007accab in ssl3_accept () from /lib/libssl.so.4 #8 0x007ac944 in ssl3_accept () from /lib/libssl.so.4 #9 0x007bbcaa in SSL_accept () from /lib/libssl.so.4 #10 0x007b780d in ssl23_get_client_hello () from /lib/libssl.so.4 #11 0x007b7712 in ssl23_accept () from /lib/libssl.so.4 #12 0x007bbcaa in SSL_accept () from /lib/libssl.so.4 #13 0x08051bc3 in shut_down () #14 0x0804dda3 in shut_down () #15 0x0804ce9d in ?? () #16 0x00000001 in ?? () #17 0x098eab90 in ?? () #18 0x00000000 in ?? () (gdb) 21482: (gdb) where #0 0x006f4f0e in __read_nocancel () from /lib/tls/libc.so.6 #1 0x00355427 in BIO_new_socket () from /lib/libcrypto.so.4 #2 0x003533e2 in BIO_read () from /lib/libcrypto.so.4 #3 0x0047ae23 in ssl23_read_bytes () from /lib/libssl.so.4 #4 0x00479c61 in ssl23_get_client_hello () from /lib/libssl.so.4 #5 0x00479712 in ssl23_accept () from /lib/libssl.so.4 #6 0x0047dcaa in SSL_accept () from /lib/libssl.so.4 #7 0x08051bc3 in shut_down () #8 0x0804dda3 in shut_down () #9 0x0804dba8 in shut_down () #10 0x0804cde9 in ?? () #11 0x095f74d0 in ?? () #12 0x0807e79c in config_need_data () #13 0x095a5978 in ?? () #14 0x0807fff6 in config_need_data () #15 0x0807e778 in config_need_data () #16 0x08101c40 in ?? () #17 0x00000000 in ?? () (gdb)Fortunately these stuck processes don't hold any locks anymore! I understand that I can probably just kill them, but I wonder what the underlying cause of this problem is. Is it likely something in Cyrus or something in the libraries?
Thanks, Sebastian -- Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK Universität zu Köln / Cologne University - Tel. +49-221-478-5587
processes.pdf
Description: Adobe PDF document
pgpMsaC5OJ3lh.pgp
Description: PGP signature