** Summary changed: - ovsdbapp can time out on raft leadership change + [SRU] ovsdbapp can time out on raft leadership change
** Description changed: When raft leadership changes, any leader-only connections will be disconnected and will need to reconnect to the new leader. When this happens, the IDL will return a txn status of TRY_AGAIN. The current code tries to do an exponential backoff with sleep() due to an issue where those can be spammed 1000s of times a second. This sleep also prevents reconnecting quickly because idl.run() is not called rapidly and can lead to timeouts. + + -------------------------------------------------------------------------------- + SRU TEMPLATE: + + [Impact] + + Please see original bug description. What i can add to this is that what + we saw in production as a consequence of this was that ovsdbapp + transactions would fail after a timeout and ovsdbapp would then end up + in a retry sequence such that the transations would not get retried and + vm tap devices would not get deleted from ovs when a vm was deleted. The + result was a build up of "stale" tap devices on br-int (visible as "No + such device" entries in ovs-vsctl show). + + [Test Plan] + + * Deploy OpenStack Jammy (Yoga) with ml2-ovn + * Spawn several vms + * Trigger many ovn-central db leadership switches by restarting ovn-central units in rotation leaving enough between each for a new leader to be elected. + * Delete the vms and create a load more while leaders are being re-elected. + * First check that /var/log/nova/nova-compute.log does not contain the "OVSDB transaction returned TRY_AGAIN" message over and over then also check that ovs-vsctl show does not contain any "stale" ports with messages like the following: + + Port tapa5d45fc6-02 + Interface tapa5d45fc6-02 + error: "could not open network device tapa5d45fc6-02 (No such device)" + + + [Regression Potential] + This patch is not expected to introduce any regressions. ** Patch added: "lp1988457-jammy.debdiff" https://bugs.launchpad.net/ovsdbapp/+bug/1988457/+attachment/5801132/+files/lp1988457-jammy.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1988457 Title: [SRU] ovsdbapp can time out on raft leadership change To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1988457/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs