EMAIL PROTECTED]";
I also reported a problem with lmtpd processes hung waiting for a lock on
the user's quota file.
Two days ago, I had one of these stuck lmtpd processes again. I found the
imapd that was holding the lock (a dial-up user that had disconnected),
and killed the offendin
Sorry, I will have to wait til 2.1.16 to test it. I can't just plug the
fud.c from CVS and compile it, and I am really too busy these day to
make a full checkout from CVS and test it.
I'll report my experience with 2.1.16, if and when it come out.
On Wed, Sep 24, 2003 at 01:24:11PM -0400, Etien
There is now a preliminary patch available.
http://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1177
Please test the second patch, and report back using the bug tracking system.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the
On Fri, 26 Sep 2003, Tom wrote:
> On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote:
> > With SYSV you will get the interrupted system call, unless you tell it
> > somehow not to do it (the SA_RESTART stuff). If we are to accomodate the
> > BSDs, we can:
> > 1. Let them have the short end o
On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote:
> On Wed, 24 Sep 2003, Etienne Goyer wrote:
> > On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote:
> > > However, I have looked into this and to my surprise, Linux is indeed
> > > restarting the system calls instead of returning
On Thu, 25 Sep 2003, Etienne Goyer wrote:
> However, the man page is wrong about EINTR at least as far as RedHat 7.x
> is concerned. In a murder environnement, when following a referral :
No it isn't wrong. The problem is signals that are configured via
signal() instead of sigaction(). On Linu
On Wed, Sep 24, 2003 at 08:01:11PM +0100, Patrick Welche wrote:
> I don't understand. The only alarm() business I can see in imap/fud.c
> is around recvfrom which at least according to its man page says
>
> [EINTR]The receive was interrupted by delivery of a signal
>
On Wed, 24 Sep 2003, Rob Siemborski wrote:
> On Wed, 24 Sep 2003, Andrew Morgan wrote:
>
> > > /dev/urandom for its entropy source, rather than /dev/random?
> >
> > I've already compiled cyrus-sasl to use /dev/urandom. I'm not sure where
> > else I can change that, assuming this is the problem.
On Wed, 24 Sep 2003, Andrew Morgan wrote:
> So it doesn't have /dev/(u)random open. But it does have a user's message
> open. And the connection is one of our dial-up hosts, so it seems like
> that the user's modem connection got abruptly dropped.
[snip]
> It looks like somewhere along the line
On Wed, 24 Sep 2003, Jonathan Marsden wrote:
> On 24 Sep 2003, Andrew Morgan writes:
>
> > I've just ran into this problem again. This time I have the gdb
> > backtrace of both the lmtpd process trying to get the lock and the
> > imap process holding the lock. There is nothing new in the lmtpd
On 24 Sep 2003, Andrew Morgan writes:
> I've just ran into this problem again. This time I have the gdb
> backtrace of both the lmtpd process trying to get the lock and the
> imap process holding the lock. There is nothing new in the lmtpd
> backtrace. Here is the imapd backtrace:
> 0x402ae3c4
On Wed, 24 Sep 2003, Andrew Morgan wrote:
> > /dev/urandom for its entropy source, rather than /dev/random?
>
> I've already compiled cyrus-sasl to use /dev/urandom. I'm not sure where
> else I can change that, assuming this is the problem.
If the IMAP process is trying to read for periods on th
Rob,
I've just ran into this problem again. This time I have the gdb backtrace
of both the lmtpd process trying to get the lock and the imap process
holding the lock. There is nothing new in the lmtpd backtrace. Here is
the imapd backtrace:
0x402ae3c4 in read () from /lib/libc.so.6
(gdb) bt
#
Well, that could definitely be a problem... Next time we see a lock problem
occur, I will look based on the information below to see if it is really a
lock problem on the quota file.
Thanks,
Scott
--On Wednesday, September 24, 2003 12:32 PM -0700 Andrew Morgan
<[EMAIL PROTECTED]> wrote:
On Wed,
On Wed, 24 Sep 2003, Scott Adkins wrote:
> When looking at what file the processes are all waiting to get a lock on,
> it usually turns out to be the cyrus.header file and not the quota file.
> Is this still the same bug described by Rob on bugzilla? Does it have to
> be the quota file?
>
> Als
...until your system runs out of available open files...
Then the real fun begins... :-)
-John
Andrew Morgan wrote:
> On Wed, 24 Sep 2003, John Wade wrote:
>
> > The patch I wrote still might help you since it would prevent an
> > individual user's problem from taking down the mail system. Th
On Wed, 24 Sep 2003, John C. Amodeo wrote:
> ...until your system runs out of available open files...
>
> Then the real fun begins... :-)
>
> -John
[EMAIL PROTECTED] tools]# cat /proc/sys/fs/file-max
209708
I'm in a lot of trouble if I've got 209708 files open. :)
Andy
Andy,
Its happen to me before... Don't think it can't... That's all I'm
saying...
-John
Andrew Morgan wrote:
> On Wed, 24 Sep 2003, John C. Amodeo wrote:
>
> > ...until your system runs out of available open files...
> >
> > Then the real fun begins... :-)
> >
> > -John
>
> [EMAIL PROTECTED]
On Wed, 24 Sep 2003, John Wade wrote:
> The patch I wrote still might help you since it would prevent an
> individual user's problem from taking down the mail system. The user's
> mailbox would remain inaccessible, but the lmtpd processes attempting
> delivery would exit with errors and mail d
On Wed, Sep 24, 2003 at 02:20:50PM -0300, Henrique de Moraes Holschuh wrote:
> On Wed, 24 Sep 2003, Etienne Goyer wrote:
> > On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote:
> > > However, I have looked into this and to my surprise, Linux is indeed
> > > restarting the system calls i
On Wed, 24 Sep 2003, Etienne Goyer wrote:
> On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote:
> > However, I have looked into this and to my surprise, Linux is indeed
> > restarting the system calls instead of returning with EINTR. However, the
> > answer here is to set up the alarm(
Thanks. I'll test it by the end of the week, and report.
On Wed, Sep 24, 2003 at 01:18:12PM -0400, Rob Siemborski wrote:
> On Wed, 24 Sep 2003, Etienne Goyer wrote:
>
> > > I'll work on fixing fud shortly (its using signal() and it should be
> > > using sigaction()).
> >
> > The included patch a
On Wed, 24 Sep 2003, Etienne Goyer wrote:
> > Something that works in Linux, sure. Something that works in broken Linux?
> > No. Fix the breakage in Linux, instead. That's our strenght, and I *will*
> > stick to it as a Debian maintainer.
>
> While I agree with you on a technical level and admi
On Wed, 24 Sep 2003, Etienne Goyer wrote:
> > I'll work on fixing fud shortly (its using signal() and it should be
> > using sigaction()).
>
> The included patch against 2.1.13 work for me.
This sort of thing won't work for file locking. I've just committed a
patch to fud that uses sigaction() [
On Wed, Sep 24, 2003 at 12:57:37PM -0300, Henrique de Moraes Holschuh wrote:
> I did check ALL the documentation already, and ALL of it says that sigalarm
> MUST interrupt the syscall, and that it HAS to return EINTR. So, it is a
> bug. So, it needs to be squashed, and people have to either patch
On Wed, Sep 24, 2003 at 11:27:46AM -0400, Rob Siemborski wrote:
> However, I have looked into this and to my surprise, Linux is indeed
> restarting the system calls instead of returning with EINTR. However, the
> answer here is to set up the alarm() handler with sigaction without
> setting SA_REST
On Wed, 24 Sep 2003, Rob Siemborski wrote:
> However, I have looked into this and to my surprise, Linux is indeed
> restarting the system calls instead of returning with EINTR. However, the
> answer here is to set up the alarm() handler with sigaction without
> setting SA_RESTART, not to jump thro
On Wed, 24 Sep 2003, Etienne Goyer wrote:
> On Wed, Sep 24, 2003 at 11:13:06AM -0300, Henrique de Moraes Holschuh wrote:
> > It is not a general solution when you hit glibc/kernel bugs, but I can
> > certainly live with it IF I manage to track down a version of glibc and
> > kernel that won't deadl
On Wed, 24 Sep 2003, Etienne Goyer wrote:
> The obvious solution is to not use alarm() to interrupt blocking
> syscall, but to use non-blocking call with select() instead. I
> am not a very proficient C Unix programmer, so maybe this suggestion
> make no sense. However, in the case of the bug wi
On Wed, Sep 24, 2003 at 11:13:06AM -0300, Henrique de Moraes Holschuh wrote:
> It is not a general solution when you hit glibc/kernel bugs, but I can
> certainly live with it IF I manage to track down a version of glibc and
> kernel that won't deadlock, that we can recommend. Either that, or allow
On Wed, 24 Sep 2003, Scott Adkins wrote:
> Also, when we find the specific imaps process that happens to have the
> cyrus.header lock file opened for writing and has it locked, if we kill
> it off, we find that the write lock goes to another imaps process or to
> one of the LMTP processes and gets
this patch successfully on 2.0.16 and
2.1.x, and I know it has resolved our problem.
If you can solve the particular bug that causes this, more power to you,
if not, my work around resolves a number of possible deadlock issues.
Enjoy,
John
Andrew Morgan wrote:
Following up on my previous post about
On Wed, 24 Sep 2003, Rob Siemborski wrote:
> On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote:
> > Agreed, but if we are going to keep the blocking-on-lock behaviour (and I
> > know we are ;-)), we really, really should have a way to timeout and kill
> > the process if that lock does not rele
On Wed, 24 Sep 2003, Henrique de Moraes Holschuh wrote:
> Agreed, but if we are going to keep the blocking-on-lock behaviour (and I
> know we are ;-)), we really, really should have a way to timeout and kill
> the process if that lock does not release after a while.
>
> Resilience IS necessary...
On Wed, 24 Sep 2003, Rob Siemborski wrote:
> think about it. The kernel is responsible for waking processes up when
> they are blocking on a lock and it becomes available. If this isn't
> happening (causing the need to do locks in a nonblocking fashion) then
> something is wrong with the *kernel*
causes this, more power to you,
> if not, my work around resolves a number of possible deadlock issues.
>
> Enjoy,
> John
>
>
>
> Andrew Morgan wrote:
>
> >Following up on my previous post about stuck lmtpd processes. I found
> >this incredi
On Tue, 23 Sep 2003, Andrew Morgan wrote:
> I think your patch would fix the problem where are lot of processes are
> contending for a lock (by making them retry), but it wouldn't help if a
> single process keeps the lock indefinately.
I agree. The whole act of retrying for a lock is pretty sill
Oooppss. Sorry, my mailbox went temporarily over quota and the delivery
of the original thread was deferred until after I had read and responded
to the followup. It looks like the locking mechanism is working
correctly here and the bug is really in the network timeout. (or in the
implementati
On Tue, 23 Sep 2003, John Wade wrote:
> Hi Andrew,
>
> I was the one who wrote the message you found. I finally came to the
> conclusion that the flat file locking mechanism is somewhat broken in
> Cyrus, but I was never a good enough C programmer to pin down what was
> happening. (The mmap s
lmtpd processes. I found
this incredibly detailed faq at:
http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cyrus0023_33254.html
This isn't exactly the same problem, but the steps on that page helped me
figure out that they are all stuck trying to get a lock on:
/private/
On Tue, 23 Sep 2003, Rob Siemborski wrote:
> On Tue, 23 Sep 2003, Andrew Morgan wrote:
>
> > And that write lock was held by an imaps process. Once I killed the imaps
> > process, all the lmtpd's got unstuck. Unfortunately, I realize now that
> > it would have been nice to get a backtrace on t
On Tue, 23 Sep 2003, Andrew Morgan wrote:
> Hmmm, did you just add comment #1 to it? :)
Yeah. It should have been added much earlier.
> It is good to know that I can get myself out of this easily enough, but
> I'd love to see this fixed in Cyrus v2.1.16 (hint, hint). :)
Me too, but, as the c
On Tue, 23 Sep 2003, Rob Siemborski wrote:
> On Tue, 23 Sep 2003, Andrew Morgan wrote:
>
> > I'd prefer not to restart all of cyrus because I have several hundred
> > users connected right now, and this is the day that all the students are
> > returning to campus. Is there a way I can kill the
On Tue, 23 Sep 2003, Andrew Morgan wrote:
> And that write lock was held by an imaps process. Once I killed the imaps
> process, all the lmtpd's got unstuck. Unfortunately, I realize now that
> it would have been nice to get a backtrace on that imaps process to see
> why it hadn't released the l
On Tue, 23 Sep 2003, Andrew Morgan wrote:
> I'd prefer not to restart all of cyrus because I have several hundred
> users connected right now, and this is the day that all the students are
> returning to campus. Is there a way I can kill the original lmtpd process
> that got stuck to free things
Following up on my previous post about stuck lmtpd processes. I found
this incredibly detailed faq at:
http://www.faqchest.com/prgm/cyrus-l/cyrus-01/cyrus-0111/cyrus-011102/cyrus0023_33254.html
This isn't exactly the same problem, but the steps on that page helped me
figure out that
lsof output from one of these stuck lmtpd processes:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
lmtpd 27332 cyrus cwdDIR8,2 40962 /
lmtpd 27332 cyrus rtdDIR8,2 40962 /
lmtpd 27332 cyrus txtREG8,2 1562302 30
47 matches
Mail list logo