On Mar 23, 2006, at 8:27 AM, Ken Murchison wrote:
Patrick Radtke wrote:
Every day we get a number of 'terminated abnormally' messages on
our frontends.
When they happen just occasionally then everything still seems to
run fine.
However sometimes, there will be a large number of them in a short
time span and this is a sure sign that people are having trouble
logging in or are being disconnected.
here is examples from the log
Mar 22 10:50:07 tofu master[19506]: service pop3s pid 18311 in
READY state: terminated abnormally
Mar 22 10:50:10 tofu master[19506]: service imaps pid 18128 in
READY state: terminated abnormally
Mar 22 10:50:20 tofu master[19506]: service imaps pid 18348 in
READY state: terminated abnormally
Mar 22 10:50:23 tofu master[19506]: service imaps pid 13063 in
READY state: terminated abnormally
Mar 22 10:50:32 tofu master[19506]: service pop3s pid 18404 in
READY state: terminated abnormally
Mar 22 10:50:33 tofu master[19506]: service pop3s pid 18218 in
READY state: terminated abnormally
Mar 22 11:01:42 valoney master[2581]: service imaps pid 25617 in
READY state: terminated abnormally
Mar 22 11:01:46 valoney master[2581]: service pop3s pid 25682 in
READY state: terminated abnormally
Mar 22 11:01:47 valoney master[2581]: service imaps pid 25903 in
READY state: terminated abnormally
searching on just a fail process id shows
Mar 22 11:01:38 valoney imaps[25903]: imaps failed: [local]
Mar 22 11:01:47 valoney master[2581]: service imaps pid 25903 in
READY state: terminated abnormally
Mar 22 11:01:51 valoney master[2581]: service imaps pid 25903:
while trying to process message 0x2: not registered yet
Mar 22 11:01:52 valoney master[2581]: service imaps pid 25903 in
UNKNOWN state: processing message 0x2
ar 22 11:01:04 valoney pop3s[25440]: starttls: TLSv1 with cipher
RC4-MD5 (128/128 bits new) no authentication
Mar 22 11:01:20 valoney pop3s[25440]: login:
pool-141-155-147-109.ny5030.east.verizon.net [141.155.147.109]
jg2378 plaintext+TLS User logged in
Mar 22 11:01:21 valoney pop3s[25440]: failed to bind to address
128.59.48.36: Cannot assign requested address
Mar 22 11:01:21 valoney pop3s[25440]: Doing a peer verify
Mar 22 11:01:21 valoney pop3s[25440]: Doing a peer verify
Mar 22 11:01:21 valoney pop3s[25440]: starttls: TLSv1 with cipher
AES256-SHA (256/256 bits new) no authentication
Mar 22 11:01:36 valoney pop3s[25440]: pop3s failed: [local]
Mar 22 11:01:37 valoney pop3s[25440]: Fatal error:
tls_start_servertls() failed
Mar 22 11:01:41 valoney master[2581]: service pop3s pid 25440 in
READY state: terminated abnormally
Mar 22 11:01:52 valoney master[2581]: service pop3s pid 25440:
while trying to process message 0x2: not registered yet
Mar 22 11:01:52 valoney master[2581]: service pop3s pid 25440 in
UNKNOWN state: processing message 0x2
I've never seen behavior like this before. You have a pop3d which
looks like it successfully does TLS+PLAIN, but then attempts TLS
again. What I don't know is if the client has disconnected in
between, or if there is some kind of race condition.
Ken, is the second TLS just proxyd connecting to a backend?
Syslog on debug (with today's CVS cyrus 2.3 )shows:
Mar 30 13:15:05 hotdog imap[18602]: starttls: TLSv1 with cipher
AES256-SHA (256/256 bits new) no authentication
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(phr2101)(mark 1143741894, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified phr2101 -> phr2101
Mar 30 13:15:07 hotdog imap[18602]: login: asiago.cc.columbia.edu
[128.59.59.74] phr2101 PLAIN+TLS User logged in
Mar 30 13:15:07 hotdog imap[18602]: Doing a peer verify
Mar 30 13:15:07 hotdog imap[18602]: Doing a peer verify
Mar 30 13:15:07 hotdog imap[18602]: received server certificate
Mar 30 13:15:07 hotdog imap[18602]: starttls: TLSv1 with cipher
AES256-SHA (256/256 bits new) no authentication
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(phr2101)(mark 1143741894, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified phr2101 -> phr2101
Mar 30 13:15:07 hotdog imap[18602]: ptload(): fetched cache record
(murder)(mark 1143741895, current 1143742507, limit 1143738907)
Mar 30 13:15:07 hotdog imap[18602]: ptload returning data
Mar 30 13:15:07 hotdog imap[18602]: canonified murder -> murder
Mar 30 13:15:07 hotdog imap[18602]: open: user phr2101 opened INBOX
on bratwurst.cc.columbia.edu
Proxyd is getting a server certificate which makes me think the 2nd
tls is just encrypting the frontend to backend communication.
Additionally it also looking up the 'murder' user which is the proxy
user we use when frontends log into backends
Increasing the log level to debug shows a lot of
Mar 30 13:20:09 mockmeat master[23360]: service imaps pid 26574 in
BUSY state: terminated abnormally
is that normal? (These ones are being terminated while BUSY instead
of when READY). These logs are from 2.2.12
The BUSY ones look like
Mar 30 13:20:05 mockmeat imaps[26574]: accepted connection
Mar 30 13:20:05 mockmeat imaps[26574]: imaps failed: [local]
Mar 30 13:20:08 mockmeat master[23360]: process 26574 exited, status 75
Mar 30 13:20:09 mockmeat master[23360]: service imaps pid 26574 in
BUSY state: terminated abnormally
here is a different (different from my initial email) looking READY
error
Mar 30 13:19:01 mockmeat master[1494]: about to exec /usr/bin/proxyd
Mar 30 13:19:02 mockmeat imaps[1494]: executed
Mar 30 13:19:55 mockmeat imaps[1494]: accepted connection
Mar 30 13:19:56 mockmeat imaps[1494]: imaps failed: [local]
Mar 30 13:19:59 mockmeat master[23360]: process 1494 exited, status 75
Mar 30 13:20:01 mockmeat master[23360]: service imaps pid 1494 in
READY state: terminated abnormally
In these one is looks like something is failing in
'tls_start_servertls'.
Does [local] just mean the client's host couldn't be determined?
Other strangeness I saw was
Mar 30 13:19:39 mockmeat master[23360]: service sieve pid 1496:
changing from service pop3 due to received message
Mar 30 13:19:01 mockmeat master[1496]: about to exec /usr/bin/timsieved
Mar 30 13:19:02 mockmeat sieve[1496]: executed
Mar 30 13:19:03 mockmeat sieve[1496]: warning: can't get client
address: Connection reset by peer
Mar 30 13:19:04 mockmeat sieve[1496]: accepted connection
Mar 30 13:19:05 mockmeat master[23360]: process 1496 exited, status 0
Mar 30 13:19:39 mockmeat master[23360]: service sieve pid 1496:
changing from service pop3 due to received message
What's going on here? Master looks up the pid in ctable and finds
that the service id doesn't match what it is expecting?
and finally
Mar 30 13:19:36 mockmeat master[23360]: imaps has -12 workers?!?
Looking through the code these last 2 sections are marked ' /*
paranoia */'
I imagine this signifies non-normal behavior... but I'm not sure
about causes.
Our load balancer (CSM) probes the frontends every 6 seconds or so to
confirm that they are responding. The CSM makes a connection and then
disconnects right away. Could this be triggering something in Cyrus?
Its hard to think so since the errors generally are all during peak
usage, and much less in the evening even though the CSM is always
probing at the same rate.
-Patrick
----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html