Hi All,

A while back I emailed the list about a seen file locking problem we were
having with Cyrus 2.0.16 on RedHat 7.0 (kernel 2.2.19-7.0.8).  I spent a lot of
time exploring blocked imapd processes with gdb and finally came to the
conclusion that this appeared to be a kernel or glibc problem, not cyrus,
because processes were blocking trying to get a lock on .seen files that they
already had locked.  I even added code to attempt to unlock the files first
before trying to lock and it did not resolve the problem.   Finally, I came up
with the hack below.   This week someone asked for my patch, so I though I
would post it to the list in case anyone else was interested.

The revised source is available at
http://servercc.oakton.edu/~jwade/cyrus/lock_flock.c

This should replace cyrus-imapd-2.0.16/lib/lock_flock.c

A diff is also included below,  Basically this modifies the lock_reopen
function to call flock() in a non-blocking manner with a timeout parameter.
It will try once, try again immediately then sleep with a quadraticaly
increasing delay until it
reaches some maximum delay as #defined by MAXTIME,    With max time set to 99
seconds, this gives a total delay of 1+4+9+16+25+36+49+64+81 = 285 seconds
(4.75 minutes)     So that you can see what it is doing, if you have sysloging
turned on at the debug level, it will log a "lock_reopen-blocked: sleeping"
message to syslog.  If 285 seconds elapses and it can't get a lock, it returns
an error and the imapd process that is hung up exits and puts and error in the
syslog.

This happens a couple of times a week for us, and it has ceased to be an issue
since we put this in place in early January (we only know about it from the
logs.)   That being said, it is still something of a kluge.   (And I haven't
documented the code at all.)

Hope this helps someone,

John Wade


-----------------------------------------------------------
#diff lock_flock.c lock_flock.c.original
51d50
< #include <syslog.h>
58,59d56
< #define MAXTIME 99
<
83d79
<     int delay=0, i=0;
87,88c83,84
<     for(i=0,delay=0;;) {
<       r = flock(fd, LOCK_EX|LOCK_NB);
---
>     for (;;) {
>       r = flock(fd, LOCK_EX);
90,103c86,87
<           if (errno == EINTR) {
<                  continue;
<             }
<             else if ((errno == EWOULDBLOCK) && (delay < MAXTIME)) {
<                 syslog(LOG_DEBUG, "lock: reopen-blocked sleeping for %d on
int
erval %d (%d, %s)" , delay, i, fd, filename);
<                 sleep(delay);
<                 i++;
<                 delay = i*i;
<                 continue;
<             }
<           if (failaction) {
<                 if (delay >= MAXTIME) *failaction = "locking_timeout";
<                 else *failaction = "locking";
<             }
---
>           if (errno == EINTR) continue;
>           if (failaction) *failaction = "locking";
105a90
>
112a98
>
113a100
>



Reply via email to