followup: stuck lmtpd processes and sieve errors

2003-11-20 Thread Andrew Morgan
EMAIL PROTECTED]"; I also reported a problem with lmtpd processes hung waiting for a lock on the user's quota file. Two days ago, I had one of these stuck lmtpd processes again. I found the imapd that was holding the lock (a dial-up user that had disconnected), and killed the offendin

Re: followup: stuck lmtpd processes

2003-09-29 Thread Etienne Goyer
Sorry, I will have to wait til 2.1.16 to test it. I can't just plug the fud.c from CVS and compile it, and I am really too busy these day to make a full checkout from CVS and test it. I'll report my experience with 2.1.16, if and when it come out. On Wed, Sep 24, 2003 at 01:24:11PM -0400, Etien

[PATCH] lockbreak using sigalarm (was followup: stuck lmtpd processes)

2003-09-26 Thread Henrique de Moraes Holschuh
There is now a preliminary patch available. http://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1177 Please test the second patch, and report back using the bug tracking system. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the

Re: followup: stuck lmtpd processes

2003-09-26 Thread Henrique de Moraes Holschuh
On Fri, 26 Sep 2003, Tom wrote: > On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote: > > With SYSV you will get the interrupted system call, unless you tell it > > somehow not to do it (the SA_RESTART stuff). If we are to accomodate the > > BSDs, we can: > > 1. Let them have the short end o

Re: followup: stuck lmtpd processes

2003-09-26 Thread Tom
On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote: > On Wed, 24 Sep 2003, Etienne Goyer wrote: > > On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote: > > > However, I have looked into this and to my surprise, Linux is indeed > > > restarting the system calls instead of returning

Re: followup: stuck lmtpd processes

2003-09-25 Thread Rob Siemborski
On Thu, 25 Sep 2003, Etienne Goyer wrote: > However, the man page is wrong about EINTR at least as far as RedHat 7.x > is concerned. In a murder environnement, when following a referral : No it isn't wrong. The problem is signals that are configured via signal() instead of sigaction(). On Linu

Re: followup: stuck lmtpd processes

2003-09-25 Thread Etienne Goyer
On Wed, Sep 24, 2003 at 08:01:11PM +0100, Patrick Welche wrote: > I don't understand. The only alarm() business I can see in imap/fud.c > is around recvfrom which at least according to its man page says > > [EINTR]The receive was interrupted by delivery of a signal >

Re: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
On Wed, 24 Sep 2003, Rob Siemborski wrote: > On Wed, 24 Sep 2003, Andrew Morgan wrote: > > > > /dev/urandom for its entropy source, rather than /dev/random? > > > > I've already compiled cyrus-sasl to use /dev/urandom. I'm not sure where > > else I can change that, assuming this is the problem.

Re: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Andrew Morgan wrote: > So it doesn't have /dev/(u)random open. But it does have a user's message > open. And the connection is one of our dial-up hosts, so it seems like > that the user's modem connection got abruptly dropped. [snip] > It looks like somewhere along the line

Re: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
On Wed, 24 Sep 2003, Jonathan Marsden wrote: > On 24 Sep 2003, Andrew Morgan writes: > > > I've just ran into this problem again. This time I have the gdb > > backtrace of both the lmtpd process trying to get the lock and the > > imap process holding the lock. There is nothing new in the lmtpd

Re: stuck lmtpd processes

2003-09-24 Thread Jonathan Marsden
On 24 Sep 2003, Andrew Morgan writes: > I've just ran into this problem again. This time I have the gdb > backtrace of both the lmtpd process trying to get the lock and the > imap process holding the lock. There is nothing new in the lmtpd > backtrace. Here is the imapd backtrace: > 0x402ae3c4

Re: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Andrew Morgan wrote: > > /dev/urandom for its entropy source, rather than /dev/random? > > I've already compiled cyrus-sasl to use /dev/urandom. I'm not sure where > else I can change that, assuming this is the problem. If the IMAP process is trying to read for periods on th

Re: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
Rob, I've just ran into this problem again. This time I have the gdb backtrace of both the lmtpd process trying to get the lock and the imap process holding the lock. There is nothing new in the lmtpd backtrace. Here is the imapd backtrace: 0x402ae3c4 in read () from /lib/libc.so.6 (gdb) bt #

Re: followup: stuck lmtpd processes

2003-09-24 Thread Scott Adkins
Well, that could definitely be a problem... Next time we see a lock problem occur, I will look based on the information below to see if it is really a lock problem on the quota file. Thanks, Scott --On Wednesday, September 24, 2003 12:32 PM -0700 Andrew Morgan <[EMAIL PROTECTED]> wrote: On Wed,

Re: followup: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
On Wed, 24 Sep 2003, Scott Adkins wrote: > When looking at what file the processes are all waiting to get a lock on, > it usually turns out to be the cyrus.header file and not the quota file. > Is this still the same bug described by Rob on bugzilla? Does it have to > be the quota file? > > Als

Re: followup: stuck lmtpd processes

2003-09-24 Thread John C. Amodeo
...until your system runs out of available open files... Then the real fun begins... :-) -John Andrew Morgan wrote: > On Wed, 24 Sep 2003, John Wade wrote: > > > The patch I wrote still might help you since it would prevent an > > individual user's problem from taking down the mail system. Th

Re: followup: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
On Wed, 24 Sep 2003, John C. Amodeo wrote: > ...until your system runs out of available open files... > > Then the real fun begins... :-) > > -John [EMAIL PROTECTED] tools]# cat /proc/sys/fs/file-max 209708 I'm in a lot of trouble if I've got 209708 files open. :) Andy

Re: followup: stuck lmtpd processes

2003-09-24 Thread John C. Amodeo
Andy, Its happen to me before... Don't think it can't... That's all I'm saying... -John Andrew Morgan wrote: > On Wed, 24 Sep 2003, John C. Amodeo wrote: > > > ...until your system runs out of available open files... > > > > Then the real fun begins... :-) > > > > -John > > [EMAIL PROTECTED]

Re: followup: stuck lmtpd processes

2003-09-24 Thread Andrew Morgan
On Wed, 24 Sep 2003, John Wade wrote: > The patch I wrote still might help you since it would prevent an > individual user's problem from taking down the mail system. The user's > mailbox would remain inaccessible, but the lmtpd processes attempting > delivery would exit with errors and mail d

Re: followup: stuck lmtpd processes

2003-09-24 Thread Patrick Welche
On Wed, Sep 24, 2003 at 02:20:50PM -0300, Henrique de Moraes Holschuh wrote: > On Wed, 24 Sep 2003, Etienne Goyer wrote: > > On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote: > > > However, I have looked into this and to my surprise, Linux is indeed > > > restarting the system calls i

Re: followup: stuck lmtpd processes

2003-09-24 Thread Henrique de Moraes Holschuh
On Wed, 24 Sep 2003, Etienne Goyer wrote: > On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote: > > However, I have looked into this and to my surprise, Linux is indeed > > restarting the system calls instead of returning with EINTR. However, the > > answer here is to set up the alarm(

Re: followup: stuck lmtpd processes

2003-09-24 Thread Etienne Goyer
Thanks. I'll test it by the end of the week, and report. On Wed, Sep 24, 2003 at 01:18:12PM -0400, Rob Siemborski wrote: > On Wed, 24 Sep 2003, Etienne Goyer wrote: > > > > I'll work on fixing fud shortly (its using signal() and it should be > > > using sigaction()). > > > > The included patch a

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Etienne Goyer wrote: > > Something that works in Linux, sure. Something that works in broken Linux? > > No. Fix the breakage in Linux, instead. That's our strenght, and I *will* > > stick to it as a Debian maintainer. > > While I agree with you on a technical level and admi

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Etienne Goyer wrote: > > I'll work on fixing fud shortly (its using signal() and it should be > > using sigaction()). > > The included patch against 2.1.13 work for me. This sort of thing won't work for file locking. I've just committed a patch to fud that uses sigaction() [

Re: followup: stuck lmtpd processes

2003-09-24 Thread Etienne Goyer
On Wed, Sep 24, 2003 at 12:57:37PM -0300, Henrique de Moraes Holschuh wrote: > I did check ALL the documentation already, and ALL of it says that sigalarm > MUST interrupt the syscall, and that it HAS to return EINTR. So, it is a > bug. So, it needs to be squashed, and people have to either patch

Re: followup: stuck lmtpd processes

2003-09-24 Thread Etienne Goyer
On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote: > However, I have looked into this and to my surprise, Linux is indeed > restarting the system calls instead of returning with EINTR. However, the > answer here is to set up the alarm() handler with sigaction without > setting SA_REST

Re: followup: stuck lmtpd processes

2003-09-24 Thread Henrique de Moraes Holschuh
On Wed, 24 Sep 2003, Rob Siemborski wrote: > However, I have looked into this and to my surprise, Linux is indeed > restarting the system calls instead of returning with EINTR. However, the > answer here is to set up the alarm() handler with sigaction without > setting SA_RESTART, not to jump thro

Re: followup: stuck lmtpd processes

2003-09-24 Thread Henrique de Moraes Holschuh
On Wed, 24 Sep 2003, Etienne Goyer wrote: > On Wed, Sep 24, 2003 at 11:13:06AM -0300, Henrique de Moraes Holschuh wrote: > > It is not a general solution when you hit glibc/kernel bugs, but I can > > certainly live with it IF I manage to track down a version of glibc and > > kernel that won't deadl

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Etienne Goyer wrote: > The obvious solution is to not use alarm() to interrupt blocking > syscall, but to use non-blocking call with select() instead. I > am not a very proficient C Unix programmer, so maybe this suggestion > make no sense. However, in the case of the bug wi

Re: followup: stuck lmtpd processes

2003-09-24 Thread Etienne Goyer
On Wed, Sep 24, 2003 at 11:13:06AM -0300, Henrique de Moraes Holschuh wrote: > It is not a general solution when you hit glibc/kernel bugs, but I can > certainly live with it IF I manage to track down a version of glibc and > kernel that won't deadlock, that we can recommend. Either that, or allow

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Scott Adkins wrote: > Also, when we find the specific imaps process that happens to have the > cyrus.header lock file opened for writing and has it locked, if we kill > it off, we find that the write lock goes to another imaps process or to > one of the LMTP processes and gets

Re: followup: stuck lmtpd processes

2003-09-24 Thread Scott Adkins
this patch successfully on 2.0.16 and 2.1.x, and I know it has resolved our problem. If you can solve the particular bug that causes this, more power to you, if not, my work around resolves a number of possible deadlock issues. Enjoy, John Andrew Morgan wrote: Following up on my previous post about

Re: followup: stuck lmtpd processes

2003-09-24 Thread Henrique de Moraes Holschuh
On Wed, 24 Sep 2003, Rob Siemborski wrote: > On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote: > > Agreed, but if we are going to keep the blocking-on-lock behaviour (and I > > know we are ;-)), we really, really should have a way to timeout and kill > > the process if that lock does not rele

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote: > Agreed, but if we are going to keep the blocking-on-lock behaviour (and I > know we are ;-)), we really, really should have a way to timeout and kill > the process if that lock does not release after a while. > > Resilience IS necessary...

Re: followup: stuck lmtpd processes

2003-09-24 Thread Henrique de Moraes Holschuh
On Wed, 24 Sep 2003, Rob Siemborski wrote: > think about it. The kernel is responsible for waking processes up when > they are blocking on a lock and it becomes available. If this isn't > happening (causing the need to do locks in a nonblocking fashion) then > something is wrong with the *kernel*

Re: followup: stuck lmtpd processes

2003-09-24 Thread Etienne Goyer
causes this, more power to you, > if not, my work around resolves a number of possible deadlock issues. > > Enjoy, > John > > > > Andrew Morgan wrote: > > >Following up on my previous post about stuck lmtpd processes. I found > >this incredi

Re: followup: stuck lmtpd processes

2003-09-24 Thread Rob Siemborski
On Tue, 23 Sep 2003, Andrew Morgan wrote: > I think your patch would fix the problem where are lot of processes are > contending for a lock (by making them retry), but it wouldn't help if a > single process keeps the lock indefinately. I agree. The whole act of retrying for a lock is pretty sill

Re: followup: stuck lmtpd processes

2003-09-24 Thread John Wade
Oooppss. Sorry, my mailbox went temporarily over quota and the delivery of the original thread was deferred until after I had read and responded to the followup. It looks like the locking mechanism is working correctly here and the bug is really in the network timeout. (or in the implementati

Re: followup: stuck lmtpd processes

2003-09-23 Thread Andrew Morgan
On Tue, 23 Sep 2003, John Wade wrote: > Hi Andrew, > > I was the one who wrote the message you found. I finally came to the > conclusion that the flat file locking mechanism is somewhat broken in > Cyrus, but I was never a good enough C programmer to pin down what was > happening. (The mmap s

Re: followup: stuck lmtpd processes

2003-09-23 Thread John Wade
lmtpd processes. I found this incredibly detailed faq at: http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cyrus0023_33254.html This isn't exactly the same problem, but the steps on that page helped me figure out that they are all stuck trying to get a lock on: /private/

Re: stuck lmtpd processes

2003-09-23 Thread Andrew Morgan
On Tue, 23 Sep 2003, Rob Siemborski wrote: > On Tue, 23 Sep 2003, Andrew Morgan wrote: > > > And that write lock was held by an imaps process. Once I killed the imaps > > process, all the lmtpd's got unstuck. Unfortunately, I realize now that > > it would have been nice to get a backtrace on t

Re: stuck lmtpd processes

2003-09-23 Thread Rob Siemborski
On Tue, 23 Sep 2003, Andrew Morgan wrote: > Hmmm, did you just add comment #1 to it? :) Yeah. It should have been added much earlier. > It is good to know that I can get myself out of this easily enough, but > I'd love to see this fixed in Cyrus v2.1.16 (hint, hint). :) Me too, but, as the c

Re: stuck lmtpd processes

2003-09-23 Thread Andrew Morgan
On Tue, 23 Sep 2003, Rob Siemborski wrote: > On Tue, 23 Sep 2003, Andrew Morgan wrote: > > > I'd prefer not to restart all of cyrus because I have several hundred > > users connected right now, and this is the day that all the students are > > returning to campus. Is there a way I can kill the

Re: stuck lmtpd processes

2003-09-23 Thread Rob Siemborski
On Tue, 23 Sep 2003, Andrew Morgan wrote: > And that write lock was held by an imaps process. Once I killed the imaps > process, all the lmtpd's got unstuck. Unfortunately, I realize now that > it would have been nice to get a backtrace on that imaps process to see > why it hadn't released the l

Re: stuck lmtpd processes

2003-09-23 Thread Rob Siemborski
On Tue, 23 Sep 2003, Andrew Morgan wrote: > I'd prefer not to restart all of cyrus because I have several hundred > users connected right now, and this is the day that all the students are > returning to campus. Is there a way I can kill the original lmtpd process > that got stuck to free things

followup: stuck lmtpd processes

2003-09-23 Thread Andrew Morgan
Following up on my previous post about stuck lmtpd processes. I found this incredibly detailed faq at: http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cyrus0023_33254.html This isn't exactly the same problem, but the steps on that page helped me figure out that

stuck lmtpd processes

2003-09-23 Thread Andrew Morgan
lsof output from one of these stuck lmtpd processes: COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME lmtpd 27332 cyrus cwdDIR8,2 40962 / lmtpd 27332 cyrus rtdDIR8,2 40962 / lmtpd 27332 cyrus txtREG8,2 1562302 30