Package: keepalived
Version: 1:2.2.7-1

I'm upgrading our servers from Bullseye to Bookworm. Some of them act as load 
balancers using keepalived.
Right now I have one Bullseye and one Bookworm with the same configuration 
checking the same services.
Several of our services are running on HTTPS therefore I'm using SSL_CHECK.
I can see that the Bookworm one occasionally fails SSL_CHECK for several 
seconds on one service while the
Bullseye does not report any problem at all.
It's quite rare - not even once per hour with 2s loop delay.

I was looking for possible reason and I've found
https://github.com/openssl/openssl/issues/20365
https://github.com/pjsip/pjproject/issues/3632
https://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error

They are all basically saying that you can have multiple SSL errors left in 
error queue and you are supposed to
run|ERR_get_error() before calling |SSL_* functions.

I've tried to patch keepalived sources (see attachment) and the problem seems 
to disappear.

I have no idea why is Bullseye package unaffected. It might be related to 
different OpenSSL version.

What do you think about this?

--
Pavel Matěja
--- keepalived-2.2.7.orig/keepalived/check/check_ssl.c
+++ keepalived-2.2.7/keepalived/check/check_ssl.c
@@ -257,6 +257,7 @@ ssl_connect(thread_ref_t thread, int new
 #endif
 	}
 
+	ERR_clear_error();
 	ret = SSL_connect(req->ssl);
 
 	return ret;
@@ -269,6 +270,7 @@ ssl_send_request(SSL * ssl, const char *
 
 	while (true) {
 		err = 1;
+		ERR_clear_error();
 		r = SSL_write(ssl, str_request, request_len);
 		if (SSL_ERROR_NONE != SSL_get_error(ssl, r))
 			break;
@@ -306,6 +308,7 @@ ssl_read_thread(thread_ref_t thread)
 	}
 
 	/* read the SSL stream - allow for terminating the data with '\0 */
+	ERR_clear_error();
 	r = SSL_read(req->ssl, req->buffer + req->len, (int)(MAX_BUFFER_LENGTH - 1 - req->len));
 
 	req->error = SSL_get_error(req->ssl, r);

Reply via email to