Package: apache2-mpm-worker
Version: 2.0.55-3
We're running Apache 2.0.55 (distro testing package ver 2.0.55-3) on
Linux 2.6.14.3. We use worker MPM, mod_actions, mod_rewrite, mod_layout
and fastcgi running php. Box serves about 40-50 requests per second in
durning day. Once in a while one of apache2 processes rapidly starts to
consume lot of memory, freezing the system and crashing it, without any
chance to stop it or trace it. We wasn't able to find out what is
causing this, error.log doesn't say anything useful and because of
sudden apache2 process behaviour tracing is also very hard.
Below is attached 'top' screen right before the actual crash, as well as
some syslog output before another server reboot.
----------
top - 21:15:21 up 1:29, 2 users, load average: 17.18, 9.77, 6.55
Tasks: 116 total,2 running, 114 sleeping,0 stopped,0 zombie
Cpu(s): 1.3% us, 5.3% sy, 0.0% ni, 0.0% id, 90.7% wa, 1.3% hi,
1.3% si
Mem:905776k total,898128k used,7648k free,404k buffers
Swap:0k total,0k used,0k free,18236k cached
PID USERPR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND
130 root150000 S 3.0 0.00:00.13 pdflush
6485 www-data 210 881m 600m 1100 S 1.3 67.90:04.83 apache2
9959 www-data 160 284m 8404 1124 S 1.0 0.90:00.57 apache2
1483 root170 2620 544 412 D 0.3 0.10:01.05 syslogd
4789 www-data 250 293m 16m 1136 S 0.3 1.90:06.94 apache2
7270 root160 2196 1100 820 R 0.3 0.10:01.30 top
9790 www-data 170 17500 5552 1272 D 0.3 0.60:00.43 php
10015 www-data 160 284m 8448 1096 S 0.3 0.90:00.59 apache2
----------
Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 1080
(apache2).
Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 25021
(apache2
).
Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25084
(apache2).
Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25116
(apache2).
Dec 28 03:14:28 werewolf kernel: Out of Memory: Killed process 25087
(apache2).
Dec 28 03:14:29 werewolf kernel: Out of Memory: Killed process 25164
(apache2).
----------
apache2.conf (parts):
ServerLimit 12
<IfModule worker.c>
StartServers 3
MaxClients 384
MinSpareThreads 20
MaxSpareThreads 50
ThreadsPerChild 32
MaxRequestsPerChild 10000
</IfModule>
AddOutputFilterByType LAYOUT text/html application/x-httpd-php
LayoutEndTag </BODY>
LayoutFooter /some_document.html
FastCgiServer /usr/lib/cgi-bin/php4
FastCgiConfig -idle-timeout 15
AddHandler php-fastcgi .php .php3 .php4
<Location /cgi-bin/php>
SetHandler fastcgi-script
</Location>
Action php-fastcgi /cgi-bin/php
AddType application/x-httpd-php .php .php3 .php4
I've added this two lines:
ulimit -S -m 25600
ulimit -H -m 25600
on top of apache2ctl script (which actually only calls
"/usr/sbin/apache2 -k start"), without any result. Top still shows
apache2 processes with resident size >= 30Mb. It already crashed our
server two times, here's top output in last
4 seconds:
top - 22:03:16 up 1 day, 1:33, 2 users, load average: 8.26, 6.90, 4.71
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23439 www-data 19 0 372m 94m 1820 S 28.3 10.7 0:08.90 apache2
top - 22:03:18 up 1 day, 1:33, 2 users, load average: 9.76, 7.23, 4.83
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23439 www-data 19 0 551m 273m 1820 S 53.1 30.9 0:09.97 apache2
132 root 15 0 0 0 0 D 9.9 0.0 0:27.16 kswapd0
This apache process was spawned after 22:03:14, then it consumed 273Mb
of memory in about 4 seconds, and finally frozen our box (probably
taking rest of memory).
In this situation you understand it's hard to deliver any debug
information. I'm suprised that ulimit doesn't work with apache, maybe we
use it the wrong way?
We have some info that may be useful though. From time to time appear
apache processes consuming a lot of memory. They grow in time, mostly
exiting normally, so I'm not sure they're related to this bug. But I
guess that's better than nothing. Below is a trace of one of this kind
of processes.
# ls /proc/9200/task
9200 9211 9214 9215 9224 9233
# strace -p 9200
Process 9200 attached - interrupt to quit
read(4, 0xbfc73f57, 1) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn() = ? (mask now HUP INT QUIT
USR1 ALRM
STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR?)
futex(0xb7024bf8, FUTEX_WAIT, 9202, NULL) = 0
futex(0xb6824bf8, FUTEX_WAIT, 9203, NULL) = 0
futex(0xb5824bf8, FUTEX_WAIT, 9205, NULL) = 0
munmap(0xb7025000, 8388608) = 0
futex(0xb5024bf8, FUTEX_WAIT, 9206, NULL) = 0
munmap(0xa6825000, 8388608) = 0
munmap(0xb6825000, 8388608) = 0
munmap(0xb6025000, 8388608) = 0
munmap(0xb5825000, 8388608) = 0
munmap(0xb5025000, 8388608) = 0
futex(0xb2824bf8, FUTEX_WAIT, 9211, NULL) = 0
munmap(0xb4825000, 8388608) = 0
munmap(0xb4025000, 8388608) = 0
munmap(0xb3825000, 8388608) = 0
munmap(0xb3025000, 8388608) = 0
== more than 100 entries ==
munmap(0xa562d000, 430080) = 0
munmap(0xa5a07000, 360448) = 0
munmap(0xa5d5e000, 315392) = 0
munmap(0xa604d000, 274432) = 0
munmap(0xa63dc000, 221184) = 0
munmap(0xa6547000, 180224) = 0
munmap(0xa6724000, 139264) = 0
brk(0x894d000) = 0x894d000
brk(0x889e000) = 0x889e000
brk(0x87ef000) = 0x87ef000
brk(0x87e5000) = 0x87e5000
brk(0x84c2000) = 0x84c2000
brk(0x84c0000) = 0x84c0000
brk(0x828a000) = 0x828a000
brk(0x8286000) = 0x8286000
brk(0x81a3000) = 0x81a3000
brk(0x819b000) = 0x819b000
write(7, "www.example.com 84."..., 1018) = 1018
write(8, "111.111.111.111 - - [28/Dec/2005:"..., 3930) = 3930
exit_group(0) = ?
Process 9200 detached
# strace -p 9211
Process 9211 attached - interrupt to quit
poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
sendfile(24, 25, [1982387], 1012962) = 13140
poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
sendfile(24, 25, [1995527], 999822) = 13140
== about 70 entries more ==
poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
sendfile(24, 25, [2967887], 27462) = 13140
poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
sendfile(24, 25, [2981027], 14322) = 13140
poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
sendfile(24, 25, [2994167], 1182) = 1182
gettimeofday({1135804066, 798228}, NULL) = 0
gettimeofday({1135804066, 798456}, NULL) = 0
times({tms_utime=82, tms_stime=79, tms_cutime=0, tms_cstime=0}) =
1718650698
gettimeofday({1135804066, 798886}, NULL) = 0
shutdown(24, 1 /* send */) = 0
poll({fd=24, events=POLLIN, revents=POLLIN?, 1, 2000) = 1
read(24, "", 512) = 0
close(24) = 0
close(25) = 0
gettimeofday({1135804068, 189043}, NULL) = 0
gettimeofday({1135804068, 189074}, NULL) = 0
_exit(0) = ?
Process 9211 detached
This poll/sendfile pattern is similar to all of them.
Please help us. :) We'll be happy with any sane workaround.
Debian testing is up-to-date with every package updated from distro testing.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]