Public bug reported:

For a few months now, we have been using OVS 2.9 (or newer) on Ubuntu Xenial in 
OPNFV, both with and without DPDK.
A while ago, we observed a couple of rare race conditions when multiple Linux 
interfaces/bridges are mixed with OVS ports/bridges. We also observed races 
between DPDK binding and openvswitch-switch (actually openvswitch-switch-dpdk 
configured using alternatives).
We worked around those issues by using a solution derived from the official OVS 
Debian readme, which recommends avoiding using `auto` for OVS bridges. Instead, 
we used `auto` for OVS bridges, but omitted the `auto` for the OVS ports in 
them. That worked almost perfectly for a while.

However, we recently bumped a few unrelated software components (since
we migrated from Queens to Rocky in OPNFV) and we started experiecing
race conditions again.

So I dugg a bit and found a couple of things:

1. Broken dependency between ovsdb-server/ovs-vswitchd systemd services and 
networking.service
This is probably a copy-pasta error from [1] `Before: network.service` which 
should probably be `Before: networking.service` on Debian systems.
The consequence is quite serious - on Debian systems, the OVS services start 
*after* networking.service.
Changing this leads to a service order change, which turns out to be quite the 
rabbit hole ...

2. Outdated ifupdown scripts
For example /etc/network/if-pre-up.d/openvswitch still references the old 
`openvswitch-nonetwork.service`.
Luckily, this is not critical, as the fallback uses `service openvswitch-switch 
[...]`, so I'm not sure this should be changed, but I thought it's worth 
mentioning.

3. Debian OVS does *not* handle OVS bridges without `auto`
Upstream OVS readme recommends ommitting `auto` for OVS bridges, as mentioned 
earlier, to avoid exactly the race conditions we saw.
Although following the recommendation in the upstream readme leads to a working 
system (`networking.service` no longer fails to start due to missing OVS 
bridges and/or vice-versa - ovs services no longer complain about Linux 
interfaces being in down state when trying to add them to OVS bridges), OVS 
bridges end up in DOWN state since nobody bothers to ifup them.
Imo, networking.service (or some *other* mechanism) should call `/sbin/ifup 
--allow=ovs -a --read-environment` *after* the initial `/sbin/ifup -a 
--read-enviroment` (provided the ordering issue #1 was changed to start OVS 
first, of course).

4. ovsdb-server should never start before DPDK service if DPDK is installed
This should actually be easy to fix and I have to admit I haven't run into it 
lately, although I remember it being an issue a while ago.
Anyway, a simple `After: dpdk.service` wouldn't hurt.

5. If OVS starts before networking.service, cloud-init causes cyclic 
dependencies
If we configure OVS services to start first, systemd might decide to randomly 
remove some units to break the following circular dependency:
  ovs-vswitchd --> ovsdb-server -(default dep)-> sysinit.target -->
  cloud-init.service --> networking.service --> ovs-vswitchd
In my tests, I just set 'DefaultDependencies=no' for OVS services, although 
this might require explicitly adding back some of the indirect dependencies of 
`sysinit.target`, so it's a sensible recommendation.

On my test systems, I didn't bother handling #2, as for the others I
have some systemd drop-ins (see below), which so far seem to produce
reproductible working environments.

# cat /etc/systemd/system/ovsdb-server.service.d/override.conf
[Unit]
After=dpdk.service
Before=networking.service
DefaultDependencies=no

# cat /etc/systemd/system/networking.service.d/ovs_workaround.conf
[Service]
ExecStart=/sbin/ifup --allow=ovs -a --read-environment

# cat /etc/systemd/system/ovs-vswitchd.service.d/override.conf
[Unit]
Before=networking.service
DefaultDependencies=no

# lsb_release -rd
Description:    Ubuntu 16.04.5 LTS
Release:        16.04

# apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.9.0-0ubuntu1~cloud0
  Candidate: 2.9.0-0ubuntu1~cloud0
  Version table:
 *** 2.9.0-0ubuntu1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
xenial-updates/queens/main amd64 Packages
        100 /var/lib/dpkg/status

[1] https://github.com/openvswitch/ovs/blob/master/rhel
/usr_lib_systemd_system_ovsdb-server.service#L4

** Affects: openvswitch (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1813371

Title:
  OVS 2.9+ systemd integration issues

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1813371/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to