Mark Sapiro wrote:
> On 8/6/25 18:17, Mark Sapiro wrote:
> > On 8/6/25 09:03, Jérôme Charaoui wrote:
> > I'm running a (rather large) mailman3 instance on Debian 13 (trixie)
> > using the distribution packages.
> > To manage the mailman3 daemon, the package installs this systemd
> > service unit:
> > https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/
> > It has been working fine until recently: after a reboot, we noticed
> > the mailman3 unit consistently failing to start due to a timeout error.
> > I figured out that the timeout was related to the service unit type
> > (Type=forking) and the fact that the main/parent process was forking
> > subprocesses (and running normally), writing a PID file, but it wasn't
> > exiting, so systemd identifies this as an error and aborts the service.
> > The master is not supposed to exit. It continues to run and monitor the
> > child runner processes and will under some circumstances restart a
> > runner that dies.
> > Further, our recommended systemd configuration is at
> https://docs.mailman3.org/en/latest/install/virtualenv.html#starting-mailman...
> and includes Type=forking and this is the first I've heard of an issue
> with that.

I ended up figuring out the problem by myself:

I needed to create a list, and via the web interface it would return a 502 
error after a small delay. On the command-line, the "create" operation was 
hanging forever, never completing.

I used strace to check out what the process was doing and saw it was in a loop 
attempting to get a lock on "/var/lib/mailman3/locks/mta". There were a number 
of files in that locks directory that also seemed stale, so I stopped the 
mailman3 daemons, removed all the lock files manually, and started them again. 
That fixed both creating a new list and the fact that the startup process was 
not exiting as systemd was expecting. [0]

It seems to me like this all might be caused by some code path that will retry 
forever to get a lock without ever timing out or logging some error.

Furthermore, I suspect the stale lock file themselves could have been a 
side-effect of the extremely frequent OOMs we were suffering from before 
disabling HYPERKITTY_MBOX_EXPORT, due to scrapers continuously hitting /export/ 
endpoints. [1] [2]

[0] https://gitlab.torproject.org/tpo/tpa/team/-/issues/42255
[1] https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957
[2] https://gitlab.com/mailman/hyperkitty/-/issues/385
_______________________________________________
Mailman-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Archived at: 
https://lists.mailman3.org/archives/list/[email protected]/message/EZBQKQOKUPEOC6LLMXADB2PMOZIPCV42/

This message sent to [email protected]

Reply via email to