** Summary changed:

- ovsdbapp can time out on raft leadership change
+ [SRU] ovsdbapp can time out on raft leadership change

** Description changed:

  When raft leadership changes, any leader-only connections will be
  disconnected and will need to reconnect to the new leader. When this
  happens, the IDL will return a txn status of TRY_AGAIN. The current code
  tries to do an exponential backoff with sleep() due to an issue where
  those can be spammed 1000s of times a second. This sleep also prevents
  reconnecting quickly because idl.run() is not called rapidly and can
  lead to timeouts.
+ 
+ 
--------------------------------------------------------------------------------
+ SRU TEMPLATE:
+ 
+ [Impact]
+ 
+ Please see original bug description. What i can add to this is that what
+ we saw in production as a consequence of this was that ovsdbapp
+ transactions would fail after a timeout and ovsdbapp would then end up
+ in a retry sequence such that the transations would not get retried and
+ vm tap devices would not get deleted from ovs when a vm was deleted. The
+ result was a build up of "stale" tap devices on br-int (visible as "No
+ such device" entries in ovs-vsctl show).
+ 
+ [Test Plan]
+ 
+ * Deploy OpenStack Jammy (Yoga) with ml2-ovn
+ * Spawn several vms
+ * Trigger many ovn-central db leadership switches by restarting ovn-central 
units in rotation leaving enough between each for a new leader to be elected.
+ * Delete the vms and create a load more while leaders are being re-elected.
+ * First check that /var/log/nova/nova-compute.log does not contain the "OVSDB 
transaction returned TRY_AGAIN" message over and over then also check that 
ovs-vsctl show does not contain any "stale" ports with messages like the 
following:
+ 
+     Port tapa5d45fc6-02
+         Interface tapa5d45fc6-02
+             error: "could not open network device tapa5d45fc6-02 (No such 
device)"
+ 
+ 
+ [Regression Potential]
+ This patch is not expected to introduce any regressions.

** Patch added: "lp1988457-jammy.debdiff"
   
https://bugs.launchpad.net/ovsdbapp/+bug/1988457/+attachment/5801132/+files/lp1988457-jammy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1988457

Title:
  [SRU] ovsdbapp can time out on raft leadership change

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1988457/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to