Hi Bron!
First of all, don't worry about the delay!. I absolutely understand you
are extremely busy and it's absolutely thankful the nice job of Fastmail
with Cyrus. By the way I apologize too for the delay answering you to
this email.
By what I have seen, this is implemented only in master. Not other tags
or branches.
I honestly think that if a process receives a TERM signal and it's not
received (which I saw it could be checked) from the master, perhaps a
special handling should be made (in the sense it's not a fail, it's a
required and asked proccess stop required and that should be done
orderly as it's being doing).
Imagine that you have in a morning (due to malware outbreak) 4 or 5
users disconections with several processes being spawning constantly.
There is a high probability of having more than MAX_READY_FAIL processes
failing in less than MAX_READY_FAIL_INTERVAL. With the idea you said,
mainly in busy machines you would be responding very slowly during that
MAX_READY_FAIL_INTERVAL (10 seconds) to several requests which could
finally end up in some not really required load in the machine.
Don't you think perhaps a way of handling it too, could be to reset the
integer nreadyfails in the data structure when :
* As of now : (now - s->lastreadyfail > MAX_READY_FAIL_INTERVAL
* But too when (now - last_received_sigurg_timestamp <
MAX_KEEP_ZERO_NREADYFAILS_INTERVAL) where
MAX_KEEP_ZERO_NREADYFAILS_INTERVAL could be defined to 60 seconds for
instance?. And as of now, only reset the integer if it's needed in that
service?.
I say it because if you don't do something similar to the commented
ideas, then how should be closed down in an ordered way processes for
users which should be disconnected with certain urgency due to non
allowed accesses with the username being detected?.
Cheers Bron!
El 2024-09-26 15:02, Bron Gondwana escribió:
Hi! Sorry nobody got back to you earlier on this.
In an example of sychronicity, I recently wrote something touching the
same area:
https://github.com/cyrusimap/cyrus-imapd/pull/5036
I solved this a different way - if the "babysit" option is set (it's
automatic for DAEMON block and can be added to each SERVICE) then we
only wait one set of MAX_READY_FAIL_INTERVAL before we'll start
spawning processes again.
However... your fix also seems sound for the non-babysit case. Do you
want to create it as a Pull Request against
https://github.com/cyrusimap/cyrus-imapd [2] master? That's the
easiest for us to work with, and ensures you get proper attribution!
Cheers,
Bron.
On Tue, Sep 10, 2024, at 12:55, ego...@sarenet.es wrote:
Hi!,
Sometime ago, I wrote that when we send TERM to imapd (although it
would happen with any other I assume, pop, sieve...) procceses we
wanted to exit, due to a user request for disconnecting his/her
sessions, sometimes happened that was like, after that sessions
disconnection (TERM to imapd processes) no enough processes where
become spawned newly. Only sometimes when very few processes needed to
be killed.
I have been able to reproduce it. If a user has connected (because
proctitle() has set it in the name) and later in very few time
"leaves" (logouts for instance) and then the process moves to READY
state if you kill with TERM more than MAX_READY_FAILS units of that
process in less than MAX_READY_FAIL_INTERVAL, master won't spawn new
processes as it's written in master.c in lines near 1100 in
reap_child() function.
It's suggested to launch a SIGHUP to master for activating again the
service, but it can't be enabled again because the service seems to
have removed from the s data structure but not stopped. Due to that
non process stop, when new imapd attemps to load (in service_create()
) it can't be created because the socket is still in use.
So, for ensuring this is correct, I have written the following patch
for master.c and that I tested on 3.0.15 :
root@debugcyrus:/usr/ports/mail/cyrus-imapd30 # diff -u
work/cyrus-imapd-3.0.15/master/master.c /master.c-definitivo
--- work/cyrus-imapd-3.0.15/master/master.c 2021-03-09
04:27:45.000000000 +0100
+++ /master.c-definitivo 2024-09-10 18:36:49.797581000 +0200
@@ -129,6 +129,11 @@
};
static int verbose = 0;
+
+/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
+static int gotsigurg = 0;
+/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
+
static int listen_queue_backlog = 32;
static int pidfd = -1;
@@ -1047,6 +1052,22 @@
}
}
+/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
+static void sigurg_handler(int sig __attribute__((unused)))
+{
+ syslog(LOG_DEBUG, "URG CAPTURADO!!!!!!");
+
+ if (gotsigurg)
+ {
+ gotsigurg = 0;
+ }
+ else
+ {
+ gotsigurg = 1;
+ }
+}
+/* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
+
static void reap_child(void)
{
int status;
@@ -1094,10 +1115,24 @@
"terminated abnormally",
SERVICEPARAM(s->name),
SERVICEPARAM(s->familyname), pid);
- if (now - s->lastreadyfail >
MAX_READY_FAIL_INTERVAL) {
+
+ syslog(LOG_DEBUG, "Senal URG vale....
--%d--",gotsigurg);
+
+ if ((now - s->lastreadyfail >
MAX_READY_FAIL_INTERVAL) || (gotsigurg))
+ {
s->nreadyfails = 0;
+
+ if (gotsigurg)
+ {
+ syslog(LOG_DEBUG, "RESETEANDO....");
+ syslog(LOG_DEBUG, "RESETEADO....");
+ }
}
+
+ syslog(LOG_DEBUG, "too many failures for
service %s/%s, resetting counters due to SIGURG received in Cyrus
master. El got vale
--%d--",SERVICEPARAM(s->name),SERVICEPARAM(s->familyname),gotsigurg);
+
s->lastreadyfail = now;
+
if (++s->nreadyfails >= MAX_READY_FAILS && s->exec) {
syslog(LOG_ERR, "too many failures for "
"service %s/%s, disabling until next SIGHUP",
@@ -1305,11 +1340,18 @@
sigemptyset(&action.sa_mask);
action.sa_handler = sighup_handler;
+
#ifdef SA_RESTART
action.sa_flags |= SA_RESTART;
#endif
if (sigaction(SIGHUP, &action, NULL) < 0)
fatalf(1, "unable to install signal handler for SIGHUP: %m");
+
+ /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
+ action.sa_handler = sigurg_handler;
+ if (sigaction(SIGURG, &action, NULL) < 0)
+ fatalf(1, "unable to install signal handler for SIGURG: %m");
+ /* RESET MAX_READY_FAILS OF SERVICE IN MASTER WRAPPER */
action.sa_handler = sigalrm_handler;
if (sigaction(SIGALRM, &action, NULL) < 0)
root@debugcyrus:/usr/ports/mail/cyrus-imapd30 #
So by what I have seen Cyrus wrapper is written (and the way it
handles services), I think that the possible solutions could be :
* Send a kill(pid,SIGTERM) for ensuring the process die before
forgetting from s structure.
* Do something similar as I have done, which gives you a time window
for having more failures than expected for some seconds and which
later could be undone with for instance the same signal sending.
Reproduced and something proposed at least.
What do you think about it? :)
Cheers!
Egoitz Aurrekoetxea
Departamento de sistemas
94 - 420 94 70
ego...@sarenet.es
www.sarenet.es [1]
Parque Tecnológico. Edificio 103
48170 Zamudio (Bizkaia)
Antes de imprimir este correo electrónico piense si es necesario
hacerlo.
--
Bron Gondwana
br...@fastmail.fm
Cyrus [3] / Devel / see discussions [4] + participants [5] + delivery
options [6] Permalink [7]
Links:
------
[1] http://www.sarenet.es
[2] https://github.com/cyrusimap/cyrus-imapd/pull/5036
[3] https://cyrus.topicbox.com/latest
[4] https://cyrus.topicbox.com/groups/devel
[5] https://cyrus.topicbox.com/groups/devel/members
[6] https://cyrus.topicbox.com/groups/devel/subscription
[7]
https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M0cfe9d73756070115586cdc5
Egoitz Aurrekoetxea
Departamento de sistemas
94 - 420 94 70 | ego...@sarenet.es
S A R E N E T S.A.U.
Parque Tecnológico. Edificio 103 | 48170 Zamudio (Bizkaia) - www.sarenet.es
Antes de imprimir este correo electrónico piense si es necesario hacerlo.
------------------------------------------
Cyrus: Devel
Permalink:
https://cyrus.topicbox.com/groups/devel/Tf9f7cf579fff1397-M579822b24d71c665da6cc5b7
Delivery options: https://cyrus.topicbox.com/groups/devel/subscription