This one time, at band camp, Simon Josefsson said:
> Simon Josefsson <[EMAIL PROTECTED]> writes:
> 
> > Thanks for quick response!  I've changed the numbers to 30/20, I'll
> > see if it helps.
> 
> I'm not sure it helped.  Right now, I'm getting plenty of
> 
> Feb 28 21:00:39 yxa sm-mta[8011]: j1SJucaK008011: Milter (clamav): timeout 
> before data read
> Feb 28 21:00:39 yxa sm-mta[8011]: j1SJucaK008011: Milter (clamav): to error 
> state
> ...
> Feb 28 21:00:41 yxa sm-mta[8014]: j1SJuevC008014: Milter (clamav): timeout 
> before data read
> Feb 28 21:00:41 yxa sm-mta[8014]: j1SJuevC008014: Milter (clamav): to error 
> state
> ...
> Feb 28 21:01:14 yxa sm-mta[8024]: j1SJvDhN008024: Milter (clamav): to error 
> state
> Feb 28 21:01:14 yxa sm-mta[8024]: j1SJvDhN008024: <[EMAIL PROTECTED]>... No 
> such user
> 
> but the number of clamav processes running is fairly small:
> 
> yxa-iv:~# ps auxww|grep clam|wc -l
> 38
> yxa-iv:~#
> 
> Isn't clamd supposed to be doing anything?  I straced both processes:
> 
> yxa-iv:~# ps auxww|grep clamd
> clamav   26398  0.0  1.2 16180 12984 ?       Ss   Feb26   0:04 /usr/sbin/clamd
> clamav   26418  0.0  1.2 16180 12984 ?       S    Feb26   0:00 /usr/sbin/clamd
> root      9030  0.0  0.0  1536  472 pts/1    R+   21:03   0:00 grep clamd
> yxa-iv:~#
> 
> But I only get this (from the second process):
> 
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> poll([{fd=6, events=POLLIN}], 1, 2000)  = 0
> getppid()                               = 26398
> 
> The first process is waiting indefinitely:
> 
> yxa-iv:~# strace -p 26398
> Process 26398 attached - interrupt to quit
> accept(0,

This is the parent, waiting for a new connection - it's in an accept
loop, waiting for an event.  Nothing is talking to it, so it just sits
there.

> I did a clamav-daemon restart, and got a new clamd:
> 
> yxa-iv:~# ps auxww|grep clamd
> clamav    9101  0.0  0.6 10024 7200 ?        Ss   21:04   0:00 /usr/sbin/clamd
> root      9115  0.0  0.0  1536  472 pts/1    R+   21:04   0:00 grep clamd
> yxa-iv:~#
> 
> But it doesn't seem to do anything either:
> 
> yxa-iv:~# strace -p 9101
> Process 9101 attached - interrupt to quit
> accept(0,

Right.  As above - there likely was no problem with clamd, but with
milter -> clamd communication.  Unfortunately, we only have sendmail
error messages, so it makes it difficult to be sure where the problem
was.  If you like, put LogSyslog in clamd.conf - that will let the
milter log there as well.

> And I still get the errors:
> 
> Feb 28 21:05:17 yxa sm-mta[8164]: j1SK1HJn008164: Milter (clamav): timeout 
> before data read
> Feb 28 21:05:17 yxa sm-mta[8164]: j1SK1HJn008164: Milter (clamav): to error 
> state
> 
> I've waited for more than a minute, with several similar errors as
> well, but clamd doesn't seem to start doing anything.
> 
> Restarting clamav-milter seem to have fixed it, though, and provoked a
> clamd strace output of:
> 
> gettimeofday({1109621294, 438811}, NULL) = 0
> rt_sigprocmask(SIG_SETMASK, NULL, ~[HUP INT KILL SEGV USR2 PIPE TERM STOP], 
> 8) = 0
> write(7, " \277&@\0\0\0\0\370\216W\10P\320\4\10\260\216W\10\374\242"..., 148) 
> = 148
> rt_sigprocmask(SIG_SETMASK, NULL, ~[HUP INT KILL SEGV USR2 PIPE TERM STOP], 
> 8) = 0
> rt_sigsuspend(~[HUP INT KILL SEGV USR2 PIPE TERM STOP RTMIN] <unfinished ...>
> --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
> <... rt_sigsuspend resumed> )           = -1 EINTR (Interrupted system call)
> sigreturn()                             = ? (mask now ~[HUP INT KILL SEGV 
> USR2 PIPE TERM STOP])
> time([1109621294])                      = 1109621294
> accept(0,
> 
> And then two new clamd were running:
> 
> yxa-iv:~# ps auxww|grep clam
> clamav    4655  0.0  0.1  4340 1216 ?        Ss   Feb24   0:00 
> /usr/bin/freshclam -d --quiet -p /var/run/clamav/freshclam.pid
> clamav    9101  0.0  0.7 10024 7332 ?        Ss   21:04   0:00 /usr/sbin/clamd
> clamav    9240  0.0  0.7 10024 7332 ?        S    21:07   0:00 /usr/sbin/clamd
> clamav    9242  0.0  0.1  8452 1392 ?        Ss   21:07   0:00 
> /usr/sbin/clamav-milter --max-children=2 -ol --external --quiet -Q [EMAIL 
> PROTECTED] --pidfile /var/run/clamav/clamav-milter.pid 
> local:/var/run/clamav/clamav-milter.ctl
> clamav    9243  0.0  0.1  8452 1392 ?        S    21:07   0:00 
> /usr/sbin/clamav-milter --max-children=2 -ol --external --quiet -Q [EMAIL 
> PROTECTED] --pidfile /var/run/clamav/clamav-milter.pid 
> local:/var/run/clamav/clamav-milter.ctl
> clamav    9244  0.0  0.1  8452 1392 ?        S    21:07   0:00 
> /usr/sbin/clamav-milter --max-children=2 -ol --external --quiet -Q [EMAIL 
> PROTECTED] --pidfile /var/run/clamav/clamav-milter.pid 
> local:/var/run/clamav/clamav-milter.ctl
> clamav    9271  0.0  0.1  8452 1392 ?        S    21:07   0:00 
> /usr/sbin/clamav-milter --max-children=2 -ol --external --quiet -Q [EMAIL 
> PROTECTED] --pidfile /var/run/clamav/clamav-milter.pid 
> local:/var/run/clamav/clamav-milter.ctl
> root      9311  0.0  0.0  1536  460 pts/1    R+   21:08   0:00 grep clam
> yxa-iv:~#
> 
> Each new clamd appear to be doing one virus scan, and then exit.
> 
> The first clamd process 9101 seem to trigger the strace above for each
> incoming mail, forking of the child clamd's.  The 9240 clamd seem to
> be monitoring whether 9101 is still alive.
> 
> Any ideas?

It looks the milter got stupid, and couldn't figure out how to talk to
clamd.  Once you restarted it, it works normally again (and clamd is
busily creating new threads to handle the requests, as it should).

If it continues to act up on you, put LogVerbose and LogClean in
clamd.conf - that will let you see the milter error messages, and also
clamd will log connections, so you can see where it's failing.  If you
do this, beware that the milter log code is horrendous, and you will
want to log to syslog (with LogSyslog, as above) - letting it log
directly to the clamav.log will produce wierd results and possibly lost
data.

This looks like it is not a soft limit problem, but a wierd state for
the milter.  It would have been helpful if you had strace'd the milter
as well as clamd, but if it happened once, it may happen again, so you
may yet get the chance :(

Take care,
-- 
 -----------------------------------------------------------------
|   ,''`.                                            Stephen Gran |
|  : :' :                                        [EMAIL PROTECTED] |
|  `. `'                        Debian user, admin, and developer |
|    `-                                     http://www.debian.org |
 -----------------------------------------------------------------

Attachment: pgpJMxxWkvv8w.pgp
Description: PGP signature

Reply via email to