Verified cosmic-proposed using test from [TESTCASE] ** Description changed:
[Impact] Need to get this added to the Ubuntu packages in order to safeguard against missed VRRP transitions due to ip -o monitor not running at the time the transition occurs. We have seen many cases in the fields where neutron routers end up as active on multiple l3 agents (via neutron api) which leads to a number of problems. [Test Case] * deploy Openstack (any version that supports l3ha) * create HA router with max-l3-agents=2 * check neutron l3-agent-list-hosting-router for master location * on both hosts that are running the l3-agent do pid=`pgrep -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID"` ps -f --ppid $pid + pkill -f "/var/lib/neutron/ha_confs/$ROUTER_UUID/keepalived.conf" pkill -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID" - ps -f --ppid $pid <<<<<<<<<<< this should return nothing now - pkill -f "/var/lib/neutron/ha_confs/$ROUTER_UUID/keepalived.conf" + ps -f --ppid $pid # <<<<<<<<<<< this should return nothing now * without this patch you should now see both agents reporting the router as "active" - * with the patch this should not happen (once neutron-keepalived-state-change has been restarted) + * with the patch this should not happen (once neutron-keepalived-state-change has been restarted by neutron-l3-agent) [Regression Potential] These patches have already landed in corresponding upstream branches and therefore have undergone reviews + unit and functional testing upstream, therefore regression potential is expected to be low. ==================================================================== Recently many L3 HA related functional tests are failing. The common thing in all those errors is fact that it fails when waiting for l3 ha router to become master. Example stack trace: ft2.12: neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ha_router_lifecycle_StringException: Traceback (most recent call last): File "neutron/tests/base.py", line 174, in func return f(self, *args, **kwargs) File "neutron/tests/base.py", line 174, in func return f(self, *args, **kwargs) File "neutron/tests/functional/agent/l3/test_ha_router.py", line 81, in test_ha_router_lifecycle self._router_lifecycle(enable_ha=True, router_info=router_info) File "neutron/tests/functional/agent/l3/framework.py", line 274, in _router_lifecycle common_utils.wait_until_true(lambda: router.ha_state == 'master') File "neutron/common/utils.py", line 690, in wait_until_true raise WaitTimeout(_("Timed out after %d seconds") % timeout) neutron.common.utils.WaitTimeout: Timed out after 60 seconds Example failure: http://logs.openstack.org/79/633979/21/check/neutron- functional-python27/ce7ef07/logs/testr_results.html.gz Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ha_state%20%3D%3D%20'master')%5C%22 ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1818614 Title: [SRU] Various L3HA functional tests fails often To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1818614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs