You have been subscribed to a public bug:

Environment:
- OpenStack 2024.02, deployed via Kolla-Ansible
- nova-compute communicating with RabbitMQ
- [oslo_messaging_rabbit]
heartbeat_in_pthread = false
ssl = true
ssl_ca_file = /etc/ssl/certs/ca-certificates.crt
rabbit_quorum_queue = true

- Have 3 computes, the compute with the error is holding about 53 instances, 
each remaining compute has about 32-35 instances.
- Each compute is using less than 30% of its resources.

Observed:
- Unexpectedly frequent reconnects/recoverable channel errors on nova-compute.
- Compute node occasionally marked as down or delayed in reporting state, 
causing scheduling delays.
- No kernel/syslog error during the time window.

Log error at current lost connection:

- Rabbitmq
2025-07-21 03:05:27.312 <0.127012395.1> missed heartbeats from client, timeout: 
60s
2025-07-21 03:05:27.312 <0.127012395.1> closing AMQP connection <0.127012395.1> 
(compute-node:45166 -> controller-node:5671 - nova-compute:...)
...
2025-07-21 03:05:40.605 <0.153316717.1> missed heartbeats from client, timeout: 
60s
2025-07-21 03:05:40.605 <0.153316717.1> closing AMQP connection <0.153316717.1> 
(compute-node:34520 -> controller-node1:5671 - nova-compute:...)

- Compute

2025-07-21 03:05:44.397 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] AMQP server on 
controller-node:5671 is unreachable: Server unexpectedly closed connection. 
Trying again in 1 seconds.: OSError: Server unexpectedly closed connection
2025-07-21 03:05:44.398 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] A recoverable 
connection/channel error occurred, trying to reconnect: Too many heartbeats 
missed
2025-07-21 03:05:44.398 A recoverable connection/channel error occurred, trying 
to reconnect: EOF occurred in violation of protocol (_ssl.c:2406)
2025-07-21 03:05:45.457 [43b1a1ae-54c0-4a27-994c-dc0a885e0897] Reconnected to 
AMQP server on controller-node:5671 via [amqp] client with port 43046.
2025-07-21 03:05:45.459 [9a2a48b0-4c3b-471e-8980-20eab5e55e0b] Reconnected to 
AMQP server on controller-node1:5671 via [amqp] client with port 41370.

similar phenomenon in another article https://bugs.launchpad.net/kolla-
ansible/+bug/2091975

** Affects: ubuntu
     Importance: Undecided
         Status: Invalid


** Tags: nova-compute
-- 
Frequent RabbitMQ heartbeat timeouts cause intermittent nova-compute reconnect 
loops in OpenStack 2024.02
https://bugs.launchpad.net/bugs/2117454
You received this bug notification because you are a member of Ubuntu Bugs, 
which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to