Package: bsd-mailx
Version: 8.1.2-0.20071201cvs-3
You can find this bug report in HTML format at
http://famzah.net/bsd-mailx-waitchild-bug/
There are sporadic false error messages when sending an email message.
Here is an example:
~$ echo This delivery will actually succeed | mail r...@example.com
Can't send mail: sendmail process failed
~$
The email message is actually sent and "sendmail" did not fail.
Here is a snippet of the source code of the affected functions and
files: http://famzah.net/bsd-mailx-waitchild-bug/affected-source.c.html
This bug is caused by the patch in "send.c" for the bug report #145379.
Under certain circumstances, a race condition can occur if:
1. The parent fork()'s a process and exec()'s "sendmail" in "send.c".
The child process is born.
2. The child starts, finishes quickly and exits. The parent has not
called wait_child(pid) in "send.c" yet.
3. The parent immediately gets SIGCHLD because the child exited already.
The sigchild() handler in "popen.c" reaps the child via waitpid() and
exits directly because findchild(pid, 1) returned NULL. It returned NULL
because the PID of the child process has not been added to the "child"
structure list at all.
4. The execution of the parent process is resumed in "send.c", and it
now calls wait_child(pid). The function wait_child(pid) returns "-1"
because wait_child(pid) in "popen.c" calls waitpid(pid, ...) again for
the same child PID, which the sigchild() handler already reaped. The
second call to findchild(pid, 1) by wait_child(pid) in "popen.c" returns
NULL too, because as already stated the PID of the child process has not
been added to the "child" structure list. As a result, the false error
message "Can't send mail: sendmail process failed" is given.
This bug happens only rarely, usually when the system is under load and
the parent process lags a bit after the child one. But it does happen.
We send about 15 messages every hour on 36 servers each, and we get 10
false error messages on average for 24 hours (0.08% false error rate).
To always reproduce the problem, add a sleep(5) in the parent process
before calling wait_child(pid) in "send.c". This simulates that the task
scheduler re-scheduled the parent process for later, when the child
process has already exited. Note that Linux does not guarantee if the
child or the parent process will execute or finish first, thus it is
practically possible that the effect of this sleep() happens on real
systems, as it does on many of ours. Here is the modified "send.c" which
you can use to always reproduce the bug:
http://famzah.net/bsd-mailx-waitchild-bug/send-reproduce-bug.c.html
I developed a small patch to fix the problem:
http://famzah.net/bsd-mailx-waitchild-bug/waitpid-sigchld.patch.html
A version suitable for downloading:
http://famzah.net/bsd-mailx-waitchild-bug/waitpid-sigchld.patch
No error handling is done for the sig*() functions but that is the way
the authors of bsd-mailx use them.
This affects all current versions of "bsd-mailx" in Debian >=5.0 and the
old "mailx" in Debian 4.0. The problem was first encountered on Debian
4.0 with "mailx" version "8.1.2-0.20050715cvs-1" and was later confirmed
and debugged on Ubuntu 9.04 (running Debian 5.0) with "bsd-mailx"
version "8.1.2-0.20081101cvs". Ubuntu inherits these packages from Debian.
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org