Re: Services in READY state and MAX_READY_FAILS in less than MAX_READY_FAIL_INTERVAL

Bron Gondwana Thu, 26 Sep 2024 06:03:20 -0700

Hi!  Sorry nobody got back to you earlier on this.

In an example of sychronicity, I recently wrote something touching the same 
area:


https://github.com/cyrusimap/cyrus-imapd/pull/5036

I solved this a different way - if the "babysit" option is set (it's automatic 
for DAEMON block and can be added to each SERVICE) then we only wait one set of 
MAX_READY_FAIL_INTERVAL before we'll start spawning processes again.

However... your fix also seems sound for the non-babysit case.  Do you want to 
create it as a Pull Request against https://github.com/cyrusimap/cyrus-imapd 
<https://github.com/cyrusimap/cyrus-imapd/pull/5036> master?  That's the 
easiest for us to work with, and ensures you get proper attribution!

Cheers,

Bron.

On Tue, Sep 10, 2024, at 12:55, ego...@sarenet.es wrote:
> Hi!,
> 
> 
> 
> Sometime ago, I wrote that when we send TERM to imapd (although it would 
> happen with any other I assume, pop, sieve...) procceses we wanted to exit, 
> due to a user request for disconnecting his/her sessions, sometimes happened 
> that was like, after that sessions disconnection (TERM to imapd processes) no 
> enough processes where become spawned newly. Only sometimes when very few 
> processes needed to be killed.
> 
> I have been able to reproduce it. If a user has connected (because 
> proctitle() has set it in the name) and later in very few time "leaves" 
> (logouts for instance) and then the process moves to READY state if you kill 
> with TERM more than MAX_READY_FAILS units of that process in less than 
> MAX_READY_FAIL_INTERVAL, master won't spawn new processes as it's written in 
> master.c in lines near 1100 in reap_child() function.
> 
> It's suggested to launch a SIGHUP to master for activating again the service, 
> but it can't be enabled again because the service seems to have removed from 
> the s data structure but not stopped. Due to that non process stop, when new 
> imapd attemps to load (in service_create() ) it can't be created because the 
> socket is still in use.
> 
> So, for ensuring this is correct, I have written the following patch for 
> master.c and that I tested on 3.0.15 : 
> 
> 
> root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # diff -u 
> work/cyrus-imapd-3.0.15/master/master.c /master.c-definitivo
> --- work/cyrus-imapd-3.0.15/master/master.c    2021-03-09 04:27:45.000000000 
> +0100
> +++ /master.c-definitivo    2024-09-10 18:36:49.797581000 +0200
> @@ -129,6 +129,11 @@
>  };
>  
>  static int verbose = 0;
> +
> +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
> +static int gotsigurg = 0;
> +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
> +
>  static int listen_queue_backlog = 32;
>  static int pidfd = -1;
>  
> @@ -1047,6 +1052,22 @@
>      }
>  }
>  
> +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
> +static void sigurg_handler(int sig __attribute__((unused)))
> +{
> +    syslog(LOG_DEBUG, "URG CAPTURADO!!!!!!");
> +
> +    if (gotsigurg)
> +    {
> +        gotsigurg = 0;
> +    }
> +    else
> +    {
> +        gotsigurg = 1;
> +    }
> +}
> +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
> +
>  static void reap_child(void)
>  {
>      int status;
> @@ -1094,10 +1115,24 @@
>                                 "terminated abnormally",
>                                 SERVICEPARAM(s->name),
>                                 SERVICEPARAM(s->familyname), pid);
> -                        if (now - s->lastreadyfail > 
> MAX_READY_FAIL_INTERVAL) {
> +
> +                        syslog(LOG_DEBUG, "Senal URG vale.... 
> --%d--",gotsigurg);                        
> +
> +                        if ((now - s->lastreadyfail > 
> MAX_READY_FAIL_INTERVAL) || (gotsigurg))
> +                        {
>                              s->nreadyfails = 0;
> +
> +                            if (gotsigurg)
> +                            {
> +                                syslog(LOG_DEBUG, "RESETEANDO....");
> +                                syslog(LOG_DEBUG, "RESETEADO....");
> +                            }
>                          }
> +
> +                        syslog(LOG_DEBUG, "too many failures for service 
> %s/%s, resetting counters due to SIGURG received in Cyrus master. El got vale 
> --%d--",SERVICEPARAM(s->name),SERVICEPARAM(s->familyname),gotsigurg);
> +
>                          s->lastreadyfail = now;
> +
>                          if (++s->nreadyfails >= MAX_READY_FAILS && s->exec) {
>                              syslog(LOG_ERR, "too many failures for "
>                                     "service %s/%s, disabling until next 
> SIGHUP",
> @@ -1305,11 +1340,18 @@
>      sigemptyset(&action.sa_mask);
>  
>      action.sa_handler = sighup_handler;
> +
>  #ifdef SA_RESTART
>      action.sa_flags |= SA_RESTART;
>  #endif
>      if (sigaction(SIGHUP, &action, NULL) < 0)
>          fatalf(1, "unable to install signal handler for SIGHUP: %m");
> +
> +    /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
> +    action.sa_handler = sigurg_handler;
> +    if (sigaction(SIGURG, &action, NULL) < 0)
> +        fatalf(1, "unable to install signal handler for SIGURG: %m");
> +    /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
>  
>      action.sa_handler = sigalrm_handler;
>      if (sigaction(SIGALRM, &action, NULL) < 0)
> root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # 
> 
> 
> 
> So by what I have seen Cyrus wrapper is written (and the way it handles 
> services), I think that the possible solutions could be :
> 
>  •  Send a kill(pid,SIGTERM) for ensuring the process die before forgetting 
> from s structure.
>  • Do something similar as I have done, which gives you a time window for 
> having more failures than expected for some seconds and which later could be 
> undone with for instance the same signal sending.
> 
> 
> Reproduced and something proposed at least.
> 
> 
> 
> What do you think about it? :)
> 
> 
> 
> Cheers!
> 
> 
> 
> 
> 
> 
> 
> sarenet
> *Egoitz Aurrekoetxea*
> Departamento de sistemas
> 94 - 420 94 70
> ego...@sarenet.es
> www.sarenet.es
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia)
> servicios sarenet
> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
> 
> *Cyrus <https://cyrus.topicbox.com/latest>* / Devel / see discussions 
> <https://cyrus.topicbox.com/groups/devel> + participants 
> <https://cyrus.topicbox.com/groups/devel/members> + delivery options 
> <https://cyrus.topicbox.com/groups/devel/subscription> Permalink 
> <https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M680416b69496188584fe2064>

-- 
  Bron Gondwana
  br...@fastmail.fm


------------------------------------------
Cyrus: Devel
Permalink: 
https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M0cfe9d73756070115586cdc5
Delivery options: https://cyrus.topicbox.com/groups/devel/subscription

Re: Services in READY state and MAX_READY_FAILS in less than MAX_READY_FAIL_INTERVAL

Reply via email to