You have been subscribed to a public bug: Environment: - OpenStack 2024.02, deployed via Kolla-Ansible - nova-compute communicating with RabbitMQ - [oslo_messaging_rabbit] heartbeat_in_pthread = false ssl = true ssl_ca_file = /etc/ssl/certs/ca-certificates.crt rabbit_quorum_queue = true
- Have 3 computes, the compute with the error is holding about 53 instances, each remaining compute has about 32-35 instances. - Each compute is using less than 30% of its resources. Observed: - Unexpectedly frequent reconnects/recoverable channel errors on nova-compute. - Compute node occasionally marked as down or delayed in reporting state, causing scheduling delays. - No kernel/syslog error during the time window. Log error at current lost connection: - Rabbitmq 2025-07-21 03:05:27.312 <0.127012395.1> missed heartbeats from client, timeout: 60s 2025-07-21 03:05:27.312 <0.127012395.1> closing AMQP connection <0.127012395.1> (compute-node:45166 -> controller-node:5671 - nova-compute:...) ... 2025-07-21 03:05:40.605 <0.153316717.1> missed heartbeats from client, timeout: 60s 2025-07-21 03:05:40.605 <0.153316717.1> closing AMQP connection <0.153316717.1> (compute-node:34520 -> controller-node1:5671 - nova-compute:...) - Compute 2025-07-21 03:05:44.397 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] AMQP server on controller-node:5671 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection 2025-07-21 03:05:44.398 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed 2025-07-21 03:05:44.398 A recoverable connection/channel error occurred, trying to reconnect: EOF occurred in violation of protocol (_ssl.c:2406) 2025-07-21 03:05:45.457 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] Reconnected to AMQP server on controller-node:5671 via [amqp] client with port 43046. 2025-07-21 03:05:45.459 [9a2a48b0-4c3b-471e-8980-20eab5e55e0b] Reconnected to AMQP server on controller-node1:5671 via [amqp] client with port 41370. similar phenomenon in another article https://bugs.launchpad.net/kolla- ansible/+bug/2091975 ** Affects: ubuntu Importance: Undecided Status: Invalid ** Tags: nova-compute -- Frequent RabbitMQ heartbeat timeouts cause intermittent nova-compute reconnect loops in OpenStack 2024.02 https://bugs.launchpad.net/bugs/2117454 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs