Hi All, A while back I emailed the list about a seen file locking problem we were having with Cyrus 2.0.16 on RedHat 7.0 (kernel 2.2.19-7.0.8). I spent a lot of time exploring blocked imapd processes with gdb and finally came to the conclusion that this appeared to be a kernel or glibc problem, not cyrus, because processes were blocking trying to get a lock on .seen files that they already had locked. I even added code to attempt to unlock the files first before trying to lock and it did not resolve the problem. Finally, I came up with the hack below. This week someone asked for my patch, so I though I would post it to the list in case anyone else was interested.
The revised source is available at http://servercc.oakton.edu/~jwade/cyrus/lock_flock.c This should replace cyrus-imapd-2.0.16/lib/lock_flock.c A diff is also included below, Basically this modifies the lock_reopen function to call flock() in a non-blocking manner with a timeout parameter. It will try once, try again immediately then sleep with a quadraticaly increasing delay until it reaches some maximum delay as #defined by MAXTIME, With max time set to 99 seconds, this gives a total delay of 1+4+9+16+25+36+49+64+81 = 285 seconds (4.75 minutes) So that you can see what it is doing, if you have sysloging turned on at the debug level, it will log a "lock_reopen-blocked: sleeping" message to syslog. If 285 seconds elapses and it can't get a lock, it returns an error and the imapd process that is hung up exits and puts and error in the syslog. This happens a couple of times a week for us, and it has ceased to be an issue since we put this in place in early January (we only know about it from the logs.) That being said, it is still something of a kluge. (And I haven't documented the code at all.) Hope this helps someone, John Wade ----------------------------------------------------------- #diff lock_flock.c lock_flock.c.original 51d50 < #include <syslog.h> 58,59d56 < #define MAXTIME 99 < 83d79 < int delay=0, i=0; 87,88c83,84 < for(i=0,delay=0;;) { < r = flock(fd, LOCK_EX|LOCK_NB); --- > for (;;) { > r = flock(fd, LOCK_EX); 90,103c86,87 < if (errno == EINTR) { < continue; < } < else if ((errno == EWOULDBLOCK) && (delay < MAXTIME)) { < syslog(LOG_DEBUG, "lock: reopen-blocked sleeping for %d on int erval %d (%d, %s)" , delay, i, fd, filename); < sleep(delay); < i++; < delay = i*i; < continue; < } < if (failaction) { < if (delay >= MAXTIME) *failaction = "locking_timeout"; < else *failaction = "locking"; < } --- > if (errno == EINTR) continue; > if (failaction) *failaction = "locking"; 105a90 > 112a98 > 113a100 >