Hi, Mailman seems to randomly stop sending messages.
I'm running: debian woody postfix 2.1.3-1 mailman 2.1.5 python 2.3.4-1 My only real suspect is that either mailmanctl isn't being started, or that bad lock files are being built up in the lock directory. Restarting mailman with "/etc/init.d/mailman restart" or "/var/lib/mailman/bin/mailmanctl restart" seems to get things working again, but I'm kind of bothered by this, since I have no idea why it stopped, and I tend to not notice it's stopped until a day or two go by (most of my lists are intermittent traffic, 30-40 users). Should I put in a cron job to restart mailman every hour or so? What else can I do? Background: This problem started sometime in the past year mailman. Before that, for a couple years, mailman worked reliably and we were quite happy with it. It's entirely possible that the problem may have happened after an upgrade. It happens intermittently, so it's hard to tell. This last time around, I spent several hours digging through list archives and reading FAQs, trying to figure out what's going on. I followed FAQ 3.14, Troubleshooting: no mail going out to list members: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq03.014.htp There are various things that are different about a debian apt mailman installation: - the "mailman" account is named "lists" - there is no /home/lists or /home/mailman - mailman's files are located in: /var/lib/mailman /usr/lib/mailman /usr/share/mailman My first thought was to check the log files, which on debian are in /var/log/mailman (I think this is soft-linked from /var/lib/mailman). However, the log files showed nothing. The last successful message was on Nov 11, the error log file's last line was from Nov 1. I have plenty of file space, checked with "df -h", got 1.5 GB available. I have plenty of inodes, checked with "df -i", got 1.5M IFree. 0) check_perms showed all sorts of weird issues, but that may be related to the general weirdness of the debian apt installation of mailman. [EMAIL PROTECTED]:/etc$ /var/lib/mailman/bin/check_perms /var/lib/mailman/mail bad group (has: root, expected list) /var/lib/mailman/Mailman bad group (has: root, expected list) /var/lib/mailman/cron bad group (has: root, expected list) /var/lib/mailman/bin bad group (has: root, expected list) /var/lib/mailman/scripts bad group (has: root, expected list) /var/lib/mailman/logs bad group (has: root, expected list) /var/lib/mailman/templates bad group (has: root, expected list) /var/lib/mailman/cgi-bin bad group (has: root, expected list) Problems found: 8 Re-run as list (or root) with -f flag to fix However, su-ing to root and re-running check_perms with -f did not fix the problems (though it reported that it was). So I skipped this step and checked the others. 1) Cron: darksleep:/var/lib/mailman# ps -aux |grep cron |grep -v grep root 8550 0.0 0.0 1756 500 ? S Sep20 0:04 /usr/sbin/cron Loks like Cron's running. 2) Aliases The aliases are all there in /etc/aliases, somebody on #postfix told me I should also run: postalias /etc/aliases Which I did, but still no change. 3) Smrsh, skipped this step, since I'm not using redhat or sendmail. 4) Interface, again, I'm not running sendmail, and in any event I'm pretty sure that the MTA's okay, since it gets used a fair bit every day and has shown no sign of problems. 5) qrunner su-ing to "lists" and running "crontab -l" showed no jobs at all, but I found the mailman files in: /etc/cron.d/mailman In any event, both /var/lib/mailman/bin/version and "dpkg -l mailman" says I'm running 2.1.5 (dpkg says 2.1.5.3), and this section of the FAQ says: If you are running Mailman 2.1.x then the qrunners are daemons that are started by $prefix/bin/mailmanctl, which itself may be being run via a 'mailman' startup script. This is described in the INSTALL document for the version of MM you are running. I can't find any INSTALL document with: dpkg -L mailman | fgrep INSTALL (Warning, don't do "dpkg -L mailman<enter>", there are over 3000 files in the mailman package :-). I can't see a mailmanctl daemon with: ps -aux| grep mailmanctl |grep -v grep I'm not sure what's going on. At first I was excited, because I figured the absence of a mailmanctl process meant that was the problem. When I did "mailmanctl start", messages waiting in the queue were delivered, and they appear to still be getting through now, a couple hours later. However, on a closer re-reading, of the above paragraph, it doesn't really say that there's supposed to be a mailmanctl process running. It doens't say much of anything, really. There's nothing about mailmanctl in /etc/cron.d/mailman. There's nothing about mailmanctl in /var/log/mailman/*. 6) Locks There are definitely lock files in /var/lib/mailman/locks, and they definitely have process IDs that don't show up in "ps -aux". But I'm not sure that's the _problem_, since things start and messages go through, even with the lock files there. Nevertheless, I removed the old lock files, since they all date from May, September, etc. 7) Logs The only thing I can find in the logs that looks suspicious is: qrunner: ---------------------------------------------------------------------- Nov 11 16:19:27 2004 (1060) OutgoingRunner qrunner started. Nov 11 16:19:27 2004 (1061) IncomingRunner qrunner started. Nov 11 18:30:26 2004 (1060) OutgoingRunner qrunner caught SIGTERM. Stopping. Nov 11 18:30:26 2004 (1060) OutgoingRunner qrunner exiting. Nov 11 18:30:26 2004 (1061) IncomingRunner qrunner caught SIGTERM. Stopping. Nov 11 18:30:33 2004 (1061) IncomingRunner qrunner exiting. ---------------------------------------------------------------------- locks: ---------------------------------------------------------------------- Nov 10 16:37:14 2004 (1606) 2004-November-thread.lock lifetime has expired, breaking Nov 10 16:37:14 2004 (1606) File "/var/lib/mailman/bin/qrunner", line 270, in? Nov 10 16:37:14 2004 (1606) main() Nov 10 16:37:14 2004 (1606) File "/var/lib/mailman/bin/qrunner", line 230, inmain Nov 10 16:37:14 2004 (1606) qrunner.run() Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 70, in run Nov 10 16:37:14 2004 (1606) filecnt = self._oneloop() Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 111, in _oneloop Nov 10 16:37:14 2004 (1606) self._onefile(msg, msgdata) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 167, in _onefile Nov 10 16:37:14 2004 (1606) keepqueued = self._dispose(mlist, msg, msgdata) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Queue/ArchRunner.py", line 73, in _dispose Nov 10 16:37:14 2004 (1606) mlist.ArchiveMail(msg) Nov 10 16:37:14 2004 (1606) File "/var/lib/mailman/Mailman/Archiver/Archiver.py", line 215, in ArchiveMail Nov 10 16:37:14 2004 (1606) h.processUnixMailbox(f) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 569, in processUnixMailbox Nov 10 16:37:14 2004 (1606) self.add_article(a) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 615, in add_article Nov 10 16:37:14 2004 (1606) article.parentID = parentID = self.get_parent_info(arch, article) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 649, in get_parent_info Nov 10 16:37:14 2004 (1606) if parentID and not self.database.hasArticle(archive, parentID): Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/HyperDatabase.py", line 273, in hasArticle Nov 10 16:37:14 2004 (1606) self.__openIndices(archive) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/HyperDatabase.py", line 251, in __openIndices Nov 10 16:37:14 2004 (1606) t = DumbBTree(os.path.join(arcdir, archive + '-' + i)) Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/HyperDatabase.py", line 61, in __init__ Nov 10 16:37:14 2004 (1606) self.lock() Nov 10 16:37:14 2004 (1606) File "/usr/lib/mailman/Mailman/Archiver/HyperDatabase.py", line 77, in lock Nov 10 16:37:14 2004 (1606) self.lockfile.lock() Nov 10 16:37:14 2004 (1606) File "/var/lib/mailman/Mailman/LockFile.py", line 306, in lock Nov 10 16:37:14 2004 (1606) important=True) Nov 10 16:37:14 2004 (1606) File "/var/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Nov 10 16:37:14 2004 (1606) traceback.print_stack(file=logf) Nov 12 08:00:02 2004 (20710) beehiverefugees.lock lifetime has expired, breaking Nov 12 08:00:02 2004 (20710) File "/usr/lib/mailman/cron/checkdbs", line 178, in ? Nov 12 08:00:02 2004 (20710) main() Nov 12 08:00:02 2004 (20710) File "/usr/lib/mailman/cron/checkdbs", line 84, in main Nov 12 08:00:02 2004 (20710) mlist = MailList.MailList(name) Nov 12 08:00:02 2004 (20710) File "/var/lib/mailman/Mailman/MailList.py", line 126, in __init__ Nov 12 08:00:02 2004 (20710) self.Lock() Nov 12 08:00:02 2004 (20710) File "/var/lib/mailman/Mailman/MailList.py", line 159, in Lock Nov 12 08:00:02 2004 (20710) self.__lock.lock(timeout) Nov 12 08:00:02 2004 (20710) File "/var/lib/mailman/Mailman/LockFile.py", line 306, in lock Nov 12 08:00:02 2004 (20710) important=True) Nov 12 08:00:02 2004 (20710) File "/var/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Nov 12 08:00:02 2004 (20710) traceback.print_stack(file=logf) ---------------------------------------------------------------------- -- Steven J. Owens [EMAIL PROTECTED] "I'm going to make broad, sweeping generalizations and strong, declarative statements, because otherwise I'll be here all night and this document will be four times longer and much less fun to read. Take it all with a grain of salt." - http://darksleep.com/notablog ------------------------------------------------------ Mailman-Users mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/