Mailman 2.1.6 Python 2.4.1 27 Lists Average 8000 people per list Average message size 100k Largest list gets around 200 messages per day Total load average 2million messages per day give or take. We have been using mailman successfully for about 4 years. Basic Problem: Since about 12/2004 The bounce processor consumes most of the memory in the system and sometimes all of the CPU. What we have tried:
* Change REGISTER_BOUNCES_EVERY = minutes(1) which change us from complete crashing to just consuming all memory and cpu. * Temporarily shut off bounce processing on all lists, no effect. * Temporarily reroute bounce messages to a mailbox for human processing, We received about 5,000 per hour most of which were temporary (mailbox full, mail server not answering, etc) * When it gets really bad Kill and restart mailman every 2 minutes. Mostly works but kills all request in progress when the kill comes through, however those request we already dead because of the locks on the lists * Contacting the bug forum on Source forge and being told that bounce processing is a problem that they hope to have time to fix in 2.1.6 (see 1077587) What top often looks like: 09:39:33 up 36 days, 15:58, 9 users, load average: 5.23, 5.75, 4.98 497 processes: 485 sleeping, 8 running, 4 zombie, 0 stopped CPU states: 67.8% user 16.6% system 0.0% nice 0.0% iowait 15.4% idle Mem: 1029884k av, 973668k used, 56216k free, 0k shrd, 149788k buff 739156k actv, 58428k in_d, 17080k in_c Swap: 2040244k av, 188092k used, 1852152k free 184036k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 14978 mailman 16 0 324M 318M 2328 R 6.5 31.6 5:46 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=BounceRunner:0:1 14977 mailman 15 0 28916 27M 2420 S 0.0 2.7 0:31 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=ArchRunner:0:1 -s 14983 mailman 15 0 27608 25M 2320 S 0.2 2.5 0:23 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=VirginRunner:0:1 14982 mailman 15 0 23832 22M 2460 S 0.7 2.2 0:58 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=OutgoingRunner:0: 14980 mailman 15 0 22904 21M 2364 S 0.0 2.1 0:27 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=IncomingRunner:0: 14981 mailman 15 0 5508 2568 2180 S 0.0 0.2 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=NewsRunner:0:1 -s 14984 mailman 15 0 5460 2548 2180 S 0.0 0.2 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=RetryRunner:0:1 - 14979 mailman 15 0 5476 2540 2180 S 0.0 0.2 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=CommandRunner:0:1 14976 mailman 25 0 5524 2220 2216 S 0.0 0.2 0:00 0 /usr/bin/python /var/mailman/bin/mailmanctl -s -q start And immediately after a shutdown restart cycle: 09:42:12 up 36 days, 16:01, 9 users, load average: 4.21, 5.23, 4.91 532 processes: 521 sleeping, 7 running, 4 zombie, 0 stopped CPU states: 82.3% user 6.3% system 0.0% nice 0.0% iowait 11.3% idle Mem: 1029884k av, 643420k used, 386464k free, 0k shrd, 149556k buff 435668k actv, 33664k in_d, 16752k in_c Swap: 2040244k av, 111832k used, 1928412k free 161048k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 32382 mailman 16 0 24312 23M 2660 R 27.2 2.3 0:03 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=ArchRunner:0:1 -s 32385 mailman 15 0 22036 21M 2616 R 23.0 2.1 0:02 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=IncomingRunner:0: 32388 mailman 15 0 21816 21M 2584 S 14.1 2.1 0:01 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=VirginRunner:0:1 32383 mailman 16 0 20604 20M 2568 R 6.1 2.0 0:03 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=BounceRunner:0:1 32387 mailman 15 0 15960 15M 2712 S 0.5 1.5 0:02 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=OutgoingRunner:0: 32379 mailman 25 0 5900 5900 2596 S 0.0 0.5 0:00 0 /usr/bin/python /var/mailman/bin/mailmanctl -s -q start 32386 mailman 15 0 5840 5840 2512 S 0.0 0.5 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=NewsRunner:0:1 -s 32384 mailman 15 0 5804 5804 2512 S 0.0 0.5 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=CommandRunner:0:1 32389 mailman 25 0 5792 5792 2512 S 0.0 0.5 0:00 0 /usr/local/bin/python /var/mailman/bin/qrunner --runner=RetryRunner:0:1 - Notice how the other tasks suddenly get busy.... Log files were completely flushed at 4PM yesterday. [EMAIL PROTECTED] ~]$ ls -al logs total 16144 drwxrwsr-x 2 mailman mailman 4096 Aug 31 19:19 . drwxrwsr-x 34 mailman mailman 4096 Sep 1 08:40 .. -rw-rw-r-- 1 mailman mailman 15793138 Sep 1 09:02 bounce -rw-rw-r-- 1 smmsp mailman 335919 Sep 1 09:01 error -rw-rw-r-- 1 mailman mailman 5181 Sep 1 06:25 locks -rw-rw-r-- 1 mailman mailman 28700 Sep 1 09:01 post -rw-rw-r-- 1 mailman mailman 35858 Sep 1 08:32 qrunner -rw-rw-r-- 1 mailman mailman 263590 Sep 1 09:02 smtp -rw-rw-r-- 1 mailman mailman 1519 Sep 1 06:31 smtp-failure -rw-rw-r-- 1 mailman mailman 6833 Sep 1 08:59 subscribe -rw-rw-r-- 1 mailman mailman 1069 Sep 1 08:32 vette Sample from bounce log: Sep 01 08:58:30 2005 (14978) bounce message w/no discernable addresses: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:30 2005 (14978) discarding unrecognized, message-id: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:32 2005 (14978) bounce message w/no discernable addresses: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:32 2005 (14978) discarding unrecognized, message-id: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:34 2005 (14978) bounce message w/no discernable addresses: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:34 2005 (14978) discarding unrecognized, message-id: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:35 2005 (14978) bounce message w/no discernable addresses: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:35 2005 (14978) discarding unrecognized, message-id: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:39 2005 (14978) bounce message w/no discernable addresses: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sep 01 08:58:39 2005 (14978) discarding unrecognized, message-id: < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> Sample from error log: Sep 01 08:57:32 2005 (14978) Ignoring unparseable message: 1125579452.21312+848e219740e9c93ca6f761384c120d97593ee364 Sep 01 08:57:58 2005 (14978) Uncaught runner exception: No terminating boundary and no trailing empty line Sep 01 08:57:58 2005 (14978) Traceback (most recent call last): File "/var/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop msg, msgdata = self._switchboard.dequeue(filebase) File "/var/mailman/Mailman/Queue/Switchboard.py", line 151, in dequeue msg = email.message_from_string(msg, Message.Message) File "/var/mailman/pythonlib/email/__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "/var/mailman/pythonlib/email/Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse self._parsebody(root, fp, firstbodyline) File "/var/mailman/pythonlib/email/Parser.py", line 240, in _parsebody msgobj = self.parsestr(part) File "/var/mailman/pythonlib/email/Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse self._parsebody(root, fp, firstbodyline) File "/var/mailman/pythonlib/email/Parser.py", line 265, in _parsebody msg = self.parse(fp) File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse self._parsebody(root, fp, firstbodyline) File "/var/mailman/pythonlib/email/Parser.py", line 206, in _parsebody raise Errors.BoundaryError( BoundaryError: No terminating boundary and no trailing empty line Sep 01 08:57:58 2005 (14978) Ignoring unparseable message: 1125579478.065345+8ca74677621e4e9262be5503f2e081184abc853e Sample from smtp log: Sep 01 09:02:32 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 1 recips, completed in 0.253 seconds Sep 01 09:02:32 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 1 recips, completed in 0.432 seconds Sep 01 09:02:32 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 1 recips, completed in 0.140 seconds Sep 01 09:02:33 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 1 recips, completed in 0.315 seconds Sep 01 09:02:42 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-printer for 2 recips, completed in 4.212 seconds Sep 01 09:02:44 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-dell for 2 recips, completed in 1.629 seconds Sep 01 09:02:44 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 2 recips, completed in 0.842 seconds Sep 01 09:02:45 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-printer for 2 recips, completed in 0.427 seconds Sep 01 09:02:45 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 2 recips, completed in 0.391 seconds Sep 01 09:02:46 2005 (14982) < <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]> smtp to brdcst-compaq for 2 recips, completed in 0.374 seconds ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp