We work around anything like this in Fastmail production code with an obnoxious external wrapper that does a `ps` listing - since every cyrus has a custom imapd.conf file and that filename is visible on the commandline of every process, we just kill everything that matches the expression. It's not great, but it works.
My future plan is to move each instance of Cyrus into its own cgroup, give the cgroup master a SIGQUIT and then wait 15 seconds and terminate the rest! Bron. On Wed, Oct 2, 2024, at 05:34, Дилян Палаузов wrote: > Hello, > > > Sometime ago, I wrote that when we send TERM to imapd (although it would > > happen with any other I assume, pop, sieve...) procceses we wanted to exit, > > When and where have you written this? > > I observe that sometimes stopping master, by sending it SIGTERM, triggers an > endless loop there waiting for some child to terminate - > https://cyrus.topicbox.com/groups/devel/Tea1dbf8f58b01177-Mec77eb0e4f3ec3641e7e0191/the-master-janitor-goes-crazy-re-debugging-deadlocks > . This leads to 100% CPU utilization because of that loop. It happens > repeatedly, but I do not know how to reproduce this. > > I thought https://github.com/cyrusimap/cyrus-imapd/pull/3449 is a fix, but it > is not. > > Does your proposal approach the problem that on SIGTERM the master process > enters endless loop? > > Greetings > Дилян > > -----Original Message----- > From: egoitz via Devel <devel@cyrus.topicbox.com> > Reply-To: Devel <devel@cyrus.topicbox.com> > To: devel@cyrus.topicbox.com > Subject: Services in READY state and MAX_READY_FAILS in less than > MAX_READY_FAIL_INTERVAL > Date: 10/09/24 19:55:04 > > Hi!, > > Sometime ago, I wrote that when we send TERM to imapd (although it would > happen with any other I assume, pop, sieve...) procceses we wanted to exit, > due to a user request for disconnecting his/her sessions, sometimes happened > that was like, after that sessions disconnection (TERM to imapd processes) no > enough processes where become spawned newly. Only sometimes when very few > processes needed to be killed. > I have been able to reproduce it. If a user has connected (because > proctitle() has set it in the name) and later in very few time "leaves" > (logouts for instance) and then the process moves to READY state if you kill > with TERM more than MAX_READY_FAILS units of that process in less than > MAX_READY_FAIL_INTERVAL, master won't spawn new processes as it's written in > master.c in lines near 1100 in reap_child() function. > It's suggested to launch a SIGHUP to master for activating again the service, > but it can't be enabled again because the service seems to have removed from > the s data structure but not stopped. Due to that non process stop, when new > imapd attemps to load (in service_create() ) it can't be created because the > socket is still in use. > So, for ensuring this is correct, I have written the following patch for > master.c and that I tested on 3.0.15 : > root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # diff -u > work/cyrus-imapd-3.0.15/master/master.c /master.c-definitivo > --- work/cyrus-imapd-3.0.15/master/master.c 2021-03-09 04:27:45.000000000 > +0100 > +++ /master.c-definitivo 2024-09-10 18:36:49.797581000 +0200 > @@ -129,6 +129,11 @@ > }; > > static int verbose = 0; > + > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > +static int gotsigurg = 0; > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + > static int listen_queue_backlog = 32; > static int pidfd = -1; > > @@ -1047,6 +1052,22 @@ > } > } > > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > +static void sigurg_handler(int sig __attribute__((unused))) > +{ > + syslog(LOG_DEBUG, "URG CAPTURADO!!!!!!"); > + > + if (gotsigurg) > + { > + gotsigurg = 0; > + } > + else > + { > + gotsigurg = 1; > + } > +} > +/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + > static void reap_child(void) > { > int status; > @@ -1094,10 +1115,24 @@ > "terminated abnormally", > SERVICEPARAM(s->name), > SERVICEPARAM(s->familyname), pid); > - if (now - s->lastreadyfail > > MAX_READY_FAIL_INTERVAL) { > + > + syslog(LOG_DEBUG, "Senal URG > vale.... --%d--",gotsigurg); > + > + if ((now - s->lastreadyfail > > MAX_READY_FAIL_INTERVAL) || (gotsigurg)) > + { > s->nreadyfails = 0; > + > + if (gotsigurg) > + { > + syslog(LOG_DEBUG, "RESETEANDO...."); > + syslog(LOG_DEBUG, "RESETEADO...."); > + } > } > + > + syslog(LOG_DEBUG, "too many failures for service > %s/%s, resetting counters due to SIGURG received in Cyrus master. El got vale > --%d--",SERVICEPARAM(s->name),SERVICEPARAM(s->familyname),gotsigurg); > + > s->lastreadyfail = now; > + > if (++s->nreadyfails >= MAX_READY_FAILS && s->exec) { > syslog(LOG_ERR, "too many failures for " > "service %s/%s, disabling until next > SIGHUP", > @@ -1305,11 +1340,18 @@ > sigemptyset(&action.sa_mask); > > action.sa_handler = sighup_handler; > + > #ifdef SA_RESTART > action.sa_flags |= SA_RESTART; > #endif > if (sigaction(SIGHUP, &action, NULL) < 0) > fatalf(1, "unable to install signal handler for SIGHUP: %m"); > + > + /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > + action.sa_handler = sigurg_handler; > + if (sigaction(SIGURG, &action, NULL) < 0) > + fatalf(1, "unable to install signal handler for SIGURG: %m"); > + /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */ > > action.sa_handler = sigalrm_handler; > if (sigaction(SIGALRM, &action, NULL) < 0) > root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # > > So by what I have seen Cyrus wrapper is written (and the way it handles > services), I think that the possible solutions could be : > * Send a kill(pid,SIGTERM) for ensuring the process die before forgetting > from s structure. > * Do something similar as I have done, which gives you a time window for > having more failures than expected for some seconds and which later could be > undone with for instance the same signal sending. > > Reproduced and something proposed at least. > > What do you think about it? :) > > Cheers! > > sarenet > Egoitz Aurrekoetxea > Departamento de sistemas > 94 - 420 94 70 > ego...@sarenet.es > www.sarenet.es > Parque Tecnológico. Edificio 103 > 48170 Zamudio (Bizkaia) > servicios sarenet > > Antes de imprimir este correo electrónico piense si es necesario hacerlo. > > Cyrus / Devel / seediscussions +participants +delivery options Permalink > -- Bron Gondwana, CEO, Fastmail Pty Ltd br...@fastmailteam.com ------------------------------------------ Cyrus: Devel Permalink: https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M31b581987b66c79a20403104 Delivery options: https://cyrus.topicbox.com/groups/devel/subscription