Hi! Sorry nobody got back to you earlier on this. In an example of sychronicity, I recently wrote something touching the same area:
https://github.com/cyrusimap/cyrus-imapd/pull/5036 I solved this a different way - if the "babysit" option is set (it's automatic for DAEMON block and can be added to each SERVICE) then we only wait one set of MAX_READY_FAIL_INTERVAL before we'll start spawning processes again. However... your fix also seems sound for the non-babysit case. Do you want to create it as a Pull Request against https://github.com/cyrusimap/cyrus-imapd <https://github.com/cyrusimap/cyrus-imapd/pull/5036> master? That's the easiest for us to work with, and ensures you get proper attribution! Cheers, Bron. On Tue, Sep 10, 2024, at 12:55, ego...@sarenet.es wrote: > Hi!, > > > > Sometime ago, I wrote that when we send TERM to imapd (although it would > happen with any other I assume, pop, sieve...) procceses we wanted to exit, > due to a user request for disconnecting his/her sessions, sometimes happened > that was like, after that sessions disconnection (TERM to imapd processes) no > enough processes where become spawned newly. Only sometimes when very few > processes needed to be killed. > > I have been able to reproduce it. If a user has connected (because > proctitle() has set it in the name) and later in very few time "leaves" > (logouts for instance) and then the process moves to READY state if you kill > with TERM more than MAX_READY_FAILS units of that process in less than > MAX_READY_FAIL_INTERVAL, master won't spawn new processes as it's written in > master.c in lines near 1100 in reap_child() function. > > It's suggested to launch a SIGHUP to master for activating again the service, > but it can't be enabled again because the service seems to have removed from > the s data structure but not stopped. Due to that non process stop, when new > imapd attemps to load (in service_create() ) it can't be created because the > socket is still in use. > > So, for ensuring this is correct, I have written the following patch for > master.c and that I tested on 3.0.15 : > > > root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # diff -u > work/cyrus-imapd-3.0.15/master/master.c /master.c-definitivo > --- work/cyrus-imapd-3.0.15/master/master.c 2021-03-09 04:27:45.000000000 > +0100 > +++ /master.c-definitivo 2024-09-10 18:36:49.797581000 +0200 > @@ -129,6 +129,11 @@ > }; > > static int verbose = 0; > + > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > +static int gotsigurg = 0; > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + > static int listen_queue_backlog = 32; > static int pidfd = -1; > > @@ -1047,6 +1052,22 @@ > } > } > > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > +static void sigurg_handler(int sig __attribute__((unused))) > +{ > + syslog(LOG_DEBUG, "URG CAPTURADO!!!!!!"); > + > + if (gotsigurg) > + { > + gotsigurg = 0; > + } > + else > + { > + gotsigurg = 1; > + } > +} > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + > static void reap_child(void) > { > int status; > @@ -1094,10 +1115,24 @@ > "terminated abnormally", > SERVICEPARAM(s->name), > SERVICEPARAM(s->familyname), pid); > - if (now - s->lastreadyfail > > MAX_READY_FAIL_INTERVAL) { > + > + syslog(LOG_DEBUG, "Senal URG vale.... > --%d--",gotsigurg); > + > + if ((now - s->lastreadyfail > > MAX_READY_FAIL_INTERVAL) || (gotsigurg)) > + { > s->nreadyfails = 0; > + > + if (gotsigurg) > + { > + syslog(LOG_DEBUG, "RESETEANDO...."); > + syslog(LOG_DEBUG, "RESETEADO...."); > + } > } > + > + syslog(LOG_DEBUG, "too many failures for service > %s/%s, resetting counters due to SIGURG received in Cyrus master. El got vale > --%d--",SERVICEPARAM(s->name),SERVICEPARAM(s->familyname),gotsigurg); > + > s->lastreadyfail = now; > + > if (++s->nreadyfails >= MAX_READY_FAILS && s->exec) { > syslog(LOG_ERR, "too many failures for " > "service %s/%s, disabling until next > SIGHUP", > @@ -1305,11 +1340,18 @@ > sigemptyset(&action.sa_mask); > > action.sa_handler = sighup_handler; > + > #ifdef SA_RESTART > action.sa_flags |= SA_RESTART; > #endif > if (sigaction(SIGHUP, &action, NULL) < 0) > fatalf(1, "unable to install signal handler for SIGHUP: %m"); > + > + /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + action.sa_handler = sigurg_handler; > + if (sigaction(SIGURG, &action, NULL) < 0) > + fatalf(1, "unable to install signal handler for SIGURG: %m"); > + /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > > action.sa_handler = sigalrm_handler; > if (sigaction(SIGALRM, &action, NULL) < 0) > root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # > > > > So by what I have seen Cyrus wrapper is written (and the way it handles > services), I think that the possible solutions could be : > > • Send a kill(pid,SIGTERM) for ensuring the process die before forgetting > from s structure. > • Do something similar as I have done, which gives you a time window for > having more failures than expected for some seconds and which later could be > undone with for instance the same signal sending. > > > Reproduced and something proposed at least. > > > > What do you think about it? :) > > > > Cheers! > > > > > > > > sarenet > *Egoitz Aurrekoetxea* > Departamento de sistemas > 94 - 420 94 70 > ego...@sarenet.es > www.sarenet.es > Parque Tecnológico. Edificio 103 > 48170 Zamudio (Bizkaia) > servicios sarenet > Antes de imprimir este correo electrónico piense si es necesario hacerlo. > > *Cyrus <https://cyrus.topicbox.com/latest>* / Devel / see discussions > <https://cyrus.topicbox.com/groups/devel> + participants > <https://cyrus.topicbox.com/groups/devel/members> + delivery options > <https://cyrus.topicbox.com/groups/devel/subscription> Permalink > <https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M680416b69496188584fe2064> -- Bron Gondwana br...@fastmail.fm ------------------------------------------ Cyrus: Devel Permalink: https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M0cfe9d73756070115586cdc5 Delivery options: https://cyrus.topicbox.com/groups/devel/subscription