Package: keepalived Version: 1:2.2.7-1 I'm upgrading our servers from Bullseye to Bookworm. Some of them act as load balancers using keepalived. Right now I have one Bullseye and one Bookworm with the same configuration checking the same services. Several of our services are running on HTTPS therefore I'm using SSL_CHECK. I can see that the Bookworm one occasionally fails SSL_CHECK for several seconds on one service while the Bullseye does not report any problem at all. It's quite rare - not even once per hour with 2s loop delay.
I was looking for possible reason and I've found https://github.com/openssl/openssl/issues/20365 https://github.com/pjsip/pjproject/issues/3632 https://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error They are all basically saying that you can have multiple SSL errors left in error queue and you are supposed to run|ERR_get_error() before calling |SSL_* functions. I've tried to patch keepalived sources (see attachment) and the problem seems to disappear. I have no idea why is Bullseye package unaffected. It might be related to different OpenSSL version. What do you think about this? -- Pavel Matěja
--- keepalived-2.2.7.orig/keepalived/check/check_ssl.c +++ keepalived-2.2.7/keepalived/check/check_ssl.c @@ -257,6 +257,7 @@ ssl_connect(thread_ref_t thread, int new #endif } + ERR_clear_error(); ret = SSL_connect(req->ssl); return ret; @@ -269,6 +270,7 @@ ssl_send_request(SSL * ssl, const char * while (true) { err = 1; + ERR_clear_error(); r = SSL_write(ssl, str_request, request_len); if (SSL_ERROR_NONE != SSL_get_error(ssl, r)) break; @@ -306,6 +308,7 @@ ssl_read_thread(thread_ref_t thread) } /* read the SSL stream - allow for terminating the data with '\0 */ + ERR_clear_error(); r = SSL_read(req->ssl, req->buffer + req->len, (int)(MAX_BUFFER_LENGTH - 1 - req->len)); req->error = SSL_get_error(req->ssl, r);