Control: tags -1 +confirmed

I can confirm both bugs here. During the last upgrade, needrestart
noticed fail2ban needed a restart, so it did. Here's what systemd sees
right now:

root@marcos:/home/anarcat# systemctl status fail2ban
● fail2ban.service - Fail2Ban Service
   Loaded: loaded (/lib/systemd/system/fail2ban.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Sun 2019-02-17 15:32:58 EST; 9min ago
     Docs: man:fail2ban(1)
  Process: 5829 ExecStop=/usr/bin/fail2ban-client stop (code=exited, status=255)
  Process: 12738 ExecStart=/usr/bin/fail2ban-client -x start (code=exited, 
status=0/SUCCESS)
 Main PID: 12745 (fail2ban-server)
    Tasks: 14 (limit: 4915)
   Memory: 33.3M
      CPU: 3min 14.842s
   CGroup: /system.slice/fail2ban.service
           └─12745 /usr/bin/python3 /usr/bin/fail2ban-server -s 
/var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ba

fév 17 15:42:06 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.80
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.82
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.84
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.85
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.86
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.87
fév 17 15:42:07 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.88
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.90
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.91
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.93
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.94
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.95
fév 17 15:42:08 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.178.97

... and those lines are still being added there:

fév 17 15:43:28 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] Ban 
60.168.182.250

etc...

In other words, fail2ban has been loading a list of IP addresses in the
firewall for the last 9 minutes. Before that, it was *removing* IP
addresses from the firewall for another three minutes (before systemd
gave up, see #...):

root@marcos:/home/anarcat# journalctl -u fail2ban.service | grep systemd
fév 17 15:29:57 marcos systemd[1]: Stopping Fail2Ban Service...
fév 17 15:31:27 marcos systemd[1]: fail2ban.service: Stopping timed out. 
Terminating.
fév 17 15:31:27 marcos systemd[1]: fail2ban.service: Control process exited, 
code=exited status=255
fév 17 15:32:57 marcos systemd[1]: fail2ban.service: State 'stop-sigterm' timed 
out. Killing.
fév 17 15:32:57 marcos systemd[1]: fail2ban.service: Killing process 2176 
(fail2ban-server) with signal SIGKILL.
fév 17 15:32:57 marcos systemd[1]: fail2ban.service: Main process exited, 
code=killed, status=9/KILL
fév 17 15:32:57 marcos systemd[1]: Stopped Fail2Ban Service.
fév 17 15:32:57 marcos systemd[1]: fail2ban.service: Unit entered failed state.
fév 17 15:32:57 marcos systemd[1]: fail2ban.service: Failed with result 
'timeout'.
fév 17 15:32:57 marcos systemd[1]: Starting Fail2Ban Service...
fév 17 15:32:58 marcos systemd[1]: Started Fail2Ban Service.

Now it systemd considers the thing "started" even though it's still
loading the IP list. Because systemd timed out, we also see this in the
logs:

fév 17 15:45:39 marcos fail2ban.actions[12745]: NOTICE [postfix-auth] 
183.160.227.26 already banned 

... obviously, fail2ban didn't have time to remove all addresses from
the firewall.

There are many things wrong here:

 1. "service fail2ban restart" should try to remove all IP addresses
    from the firewall if it's going to re-add them all a second later

 2. even if it *does* decide to do that, it shouldn't fail halfway
    through.

 3. if "stop" waits for all IPs to be cleared out, "start" should as
    well

 4. loading IP addresses shouldn't be that slow

I don't have that many addresses loaded in that jail:

Status for the jail: postfix-auth
|- Filter
|  |- Currently failed: 4
|  |- Total failed:     456
|  `- File list:        /var/log/mail.log
`- Actions
   |- Currently banned: 4499
   |- Total banned:     4499 
   `- Banned IP list:   [...]

A 5000 IP block list is really not that much. Note that I'm using the
`ipset` functionality in my jails to improve on that process (it was
even slower before):

# head -5 /etc/fail2ban/jail.local 
[DEFAULT]
findtime = 60
ignoreip = 192.168.0.7
banaction = iptables-ipset-proto6
banaction_allports = iptables-ipset-proto6-allports

-- 
On ne résout pas un problème avec les modes de pensée qui l'ont
engendré.
                        - Albert Einstein

Reply via email to