[Mailman-Users] OutgoingRunner processes hanging

2019-11-14 Thread Kevin Bowen
Hello,
Occasionally my mailman instance (2.1.9) gets into a weird state where one
or more of its OutgoingRunner processes appears to hang (usually on a large
email with a large number of recipients), causing a backlog of all other
mail on that process's "shard" (or whatever the terminology is for how
mailman divides up mail between runners based on hash). When it gets into
this state, doing a mailman restart doesn't manage to successfully kill the
"hung" process - it stays around after the restart (along with the
mailmanctl instance that started it). Doing a tcpdump on the process
usually shows that it's still sending data, but at a trickle (or sometimes
not). Any ideas what could cause this, or how to resolve it?

Kevin Bowen
kevin.t.bo...@gmail.com 
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] OutgoingRunner processes hanging

2019-11-14 Thread Mark Sapiro
On 11/14/19 4:05 PM, Kevin Bowen wrote:
> Hello,
> Occasionally my mailman instance (2.1.9) gets into a weird state where one
> or more of its OutgoingRunner processes appears to hang (usually on a large
> email with a large number of recipients), causing a backlog of all other
> mail on that process's "shard" (or whatever the terminology is for how
> mailman divides up mail between runners based on hash).


FYI, "slice" is the term we use.


> When it gets into
> this state, doing a mailman restart doesn't manage to successfully kill the
> "hung" process - it stays around after the restart (along with the
> mailmanctl instance that started it). Doing a tcpdump on the process
> usually shows that it's still sending data, but at a trickle (or sometimes
> not). Any ideas what could cause this, or how to resolve it?


OutgoingRunner is delivering the message it's working on to the
recipient list. If the process is still actually delivering to the
outgoing MTA, but slowly, this is an issue between Mailman and the MTA.

One thing you can do is set up a separate port in the MTA for delivery
only from Mailman and do little or no checking on that port. For example
with Postfix, this is what we have in master.cf on mail.python.org


# This is where mailman is injecting to (no filtering!)
127.0.0.1:8027
  inet  n   -   -   --  smtpd
-o smtpd_authorized_xforward_hosts=127.0.0.0/8
-o mynetworks=127.0.0.0/8
-o smtpd_recipient_restrictions=permit_mynetworks,reject
-o smtpd_client_restrictions=
-o smtpd_helo_restrictions=
-o smtpd_sender_restrictions=
-o smtpd_data_restrictions=
#   -o smtpd_milters=inet:127.0.0.1:11332
-o smtpd_milters=inet:127.0.0.1:8891
# inet:127.0.0.1:8891  == opendkim
# inet:127.0.0.1:11332 == rspamd


Some other hints can be found by searching the FAQ at
 for 'performance'

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] OutgoingRunner processes hanging

2019-11-14 Thread Kevin Bowen
>One thing you can do is set up a separate port in the MTA for delivery

Unfortunately we nowadays use a hosted MTA solution, so I'm not in control
of it.

>If the process is still actually delivering to the outgoing MTA, but
slowly, this is an issue between Mailman and the MTA.
Sometimes the process appears to still be delivering, but VERY slowly,
other times it still has an open TCP connection but with no data appearing
to be sent over it, other times it seems the connection has actually died
(but the process still lives). I don't doubt that the MTA is to blame
somehow, but I'm not sure how to go about recovering from it. When it gets
into this state often the only way I'm able to get mail flowing again is to
shut down mailman, remove the .bak file from the out spool, and restart
mailman, but this means I'm losing mail, correct?

Kevin Bowen
kevin.t.bo...@gmail.com 


On Thu, Nov 14, 2019 at 4:54 PM Mark Sapiro  wrote:

> On 11/14/19 4:05 PM, Kevin Bowen wrote:
> > Hello,
> > Occasionally my mailman instance (2.1.9) gets into a weird state where
> one
> > or more of its OutgoingRunner processes appears to hang (usually on a
> large
> > email with a large number of recipients), causing a backlog of all other
> > mail on that process's "shard" (or whatever the terminology is for how
> > mailman divides up mail between runners based on hash).
>
>
> FYI, "slice" is the term we use.
>
>
> > When it gets into
> > this state, doing a mailman restart doesn't manage to successfully kill
> the
> > "hung" process - it stays around after the restart (along with the
> > mailmanctl instance that started it). Doing a tcpdump on the process
> > usually shows that it's still sending data, but at a trickle (or
> sometimes
> > not). Any ideas what could cause this, or how to resolve it?
>
>
> OutgoingRunner is delivering the message it's working on to the
> recipient list. If the process is still actually delivering to the
> outgoing MTA, but slowly, this is an issue between Mailman and the MTA.
>
> One thing you can do is set up a separate port in the MTA for delivery
> only from Mailman and do little or no checking on that port. For example
> with Postfix, this is what we have in master.cf on mail.python.org
>
> 
> # This is where mailman is injecting to (no filtering!)
> 127.0.0.1:8027
>   inet  n   -   -   --  smtpd
> -o smtpd_authorized_xforward_hosts=127.0.0.0/8
> -o mynetworks=127.0.0.0/8
> -o smtpd_recipient_restrictions=permit_mynetworks,reject
> -o smtpd_client_restrictions=
> -o smtpd_helo_restrictions=
> -o smtpd_sender_restrictions=
> -o smtpd_data_restrictions=
> #   -o smtpd_milters=inet:127.0.0.1:11332
> -o smtpd_milters=inet:127.0.0.1:8891
> # inet:127.0.0.1:8891  == opendkim
> # inet:127.0.0.1:11332 == rspamd
> 
>
> Some other hints can be found by searching the FAQ at
>  for 'performance'
>
> --
> Mark Sapiro The highway is for gamblers,
> San Francisco Bay Area, Californiabetter use your sense - B. Dylan
> --
> Mailman-Users mailing list Mailman-Users@python.org
> https://mail.python.org/mailman/listinfo/mailman-users
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Security Policy: http://wiki.list.org/x/QIA9
> Searchable Archives:
> http://www.mail-archive.com/mailman-users%40python.org/
> Unsubscribe:
> https://mail.python.org/mailman/options/mailman-users/kevin.t.bowen%40gmail.com
>
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] OutgoingRunner processes hanging

2019-11-14 Thread Mark Sapiro
On 11/14/19 5:51 PM, Kevin Bowen wrote:
> 
>> If the process is still actually delivering to the outgoing MTA, but
> slowly, this is an issue between Mailman and the MTA.
> Sometimes the process appears to still be delivering, but VERY slowly,
> other times it still has an open TCP connection but with no data appearing
> to be sent over it, other times it seems the connection has actually died
> (but the process still lives). I don't doubt that the MTA is to blame
> somehow, but I'm not sure how to go about recovering from it.


Almost always, these delays are due to lack of response from the MTA.
I.e., OutgoingRunner is waiting for a reply which has not been sent or
has somehow been lost. If the connection to the MTA is actually dropped,
OutgoingRunner *should* catch this.


> When it gets
> into this state often the only way I'm able to get mail flowing again is to
> shut down mailman, remove the .bak file from the out spool, and restart
> mailman, but this means I'm losing mail, correct?


Yes. You have two choices. Removing the .bak file means any recipients
not already delivered to the MTA will be lost. If you don't remove the
.bak file, it will be recovered and reprocessed when the runner is
restarted. In this case, any recipients that were delivered previously
will get duplicates. Also, if the issue is somehow due to the message,
it will probably recur upon reprocessing.

One thing you might want to try is setting

SMTPLIB_DEBUG_LEVEL = 1

in mm_cfg.py. This requires Python >= 2.4 (I hope by now everyone is
using 2.7) and will produce copious logging of all outgoing SMTP
transactions in Mailman's error log. This may help to understand the
underlying issue.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org