Mark Sapiro wrote: > On 8/6/25 18:17, Mark Sapiro wrote: > > On 8/6/25 09:03, Jérôme Charaoui wrote: > > I'm running a (rather large) mailman3 instance on Debian 13 (trixie) > > using the distribution packages. > > To manage the mailman3 daemon, the package installs this systemd > > service unit: > > https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/ > > It has been working fine until recently: after a reboot, we noticed > > the mailman3 unit consistently failing to start due to a timeout error. > > I figured out that the timeout was related to the service unit type > > (Type=forking) and the fact that the main/parent process was forking > > subprocesses (and running normally), writing a PID file, but it wasn't > > exiting, so systemd identifies this as an error and aborts the service. > > The master is not supposed to exit. It continues to run and monitor the > > child runner processes and will under some circumstances restart a > > runner that dies. > > Further, our recommended systemd configuration is at > https://docs.mailman3.org/en/latest/install/virtualenv.html#starting-mailman... > and includes Type=forking and this is the first I've heard of an issue > with that.
I ended up figuring out the problem by myself: I needed to create a list, and via the web interface it would return a 502 error after a small delay. On the command-line, the "create" operation was hanging forever, never completing. I used strace to check out what the process was doing and saw it was in a loop attempting to get a lock on "/var/lib/mailman3/locks/mta". There were a number of files in that locks directory that also seemed stale, so I stopped the mailman3 daemons, removed all the lock files manually, and started them again. That fixed both creating a new list and the fact that the startup process was not exiting as systemd was expecting. [0] It seems to me like this all might be caused by some code path that will retry forever to get a lock without ever timing out or logging some error. Furthermore, I suspect the stale lock file themselves could have been a side-effect of the extremely frequent OOMs we were suffering from before disabling HYPERKITTY_MBOX_EXPORT, due to scrapers continuously hitting /export/ endpoints. [1] [2] [0] https://gitlab.torproject.org/tpo/tpa/team/-/issues/42255 [1] https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957 [2] https://gitlab.com/mailman/hyperkitty/-/issues/385 _______________________________________________ Mailman-users mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/[email protected]/message/EZBQKQOKUPEOC6LLMXADB2PMOZIPCV42/ This message sent to [email protected]
