On Tue, Mar 19, 2019 at 02:38:06PM +0200, Liran Alon wrote: > Hi Michael, > > Great blog-post which summarise everything very well! > > Some comments I have:
Thanks! I'll try to update everything in the post when I'm not so jet-lagged. > 1) I think that when we are using the term “1-netdev model” on community > discussion, we tend to refer to what you have defined in blog-post as > "3-device model with hidden slaves”. > Therefore, I would suggest to just remove the “1-netdev model” section and > rename the "3-device model with hidden slaves” section to “1-netdev model”. > > 2) The userspace issues result both from using “2-netdev model” and “3-netdev > model”. However, they are described in blog-post as they only exist on > “3-netdev model”. > The reason these issues are not seen in Azure environment is because these > issues were partially handled by Microsoft for their specific 2-netdev model. > Which leads me to the next comment. > > 3) I suggest that blog-post will also elaborate on what exactly are the > userspace issues which results in models different than “1-netdev model”. > The issues that I’m aware of are (Please tell me if you are aware of others!): > (a) udev rename race-condition: When net-failover device is opened, it also > opens it's slaves. However, the order of events to udev on KOBJ_ADD is first > for the net-failover netdev and only then for the virtio-net netdev. This > means that if userspace will respond to first event by open the net-failover, > then any attempt of userspace to rename virtio-net netdev as a response to > the second event will fail because the virtio-net netdev is already opened. > Also note that this udev rename rule is useful because we would like to add > rules that renames virtio-net netdev to clearly signal that it’s used as the > standby interface of another net-failover netdev. > The way this problem was workaround by Microsoft in NetVSC is to delay the > open done on slave-VF from the open of the NetVSC netdev. However, this is > still a race and thus a hacky solution. It was accepted by community only > because it’s internal to the NetVSC driver. However, similar solution was > rejected by community for the net-failover driver. > The solution that we currently proposed to address this (Patch by Si-Wei) was > to change the rename kernel handling to allow a net-failover slave to be > renamed even if it is already opened. Patch is still not accepted. > (b) Issues caused because of various userspace components DHCP the > net-failover slaves: DHCP of course should only be done on the net-failover > netdev. Attempting to DHCP on net-failover slaves as-well will cause > networking issues. Therefore, userspace components should be taught to avoid > doing DHCP on the net-failover slaves. The various userspace components > include: > b.1) dhclient: If run without parameters, it by default just enum all netdevs > and attempt to DHCP them all. > (I don’t think Microsoft has handled this) > b.2) initramfs / dracut: In order to mount the root file-system from iSCSI, > these components needs networking and therefore DHCP on all netdevs. > (Microsoft haven’t handled (b.2) because they don’t have images which perform > iSCSI boot in their Azure setup. Still an open issue) > b.3) cloud-init: If configured to perform network-configuration, it attempts > to configure all available netdevs. It should avoid however doing so on > net-failover slaves. > (Microsoft has handled this by adding a mechanism in cloud-init to blacklist > a netdev from being configured in case it is owned by a specific PCI driver. > Specifically, they blacklist Mellanox VF driver. However, this technique > doesn’t work for the net-failover mechanism because both the net-failover > netdev and the virtio-net netdev are owned by the virtio-net PCI driver). > b.4) Various distros network-manager need to be updated to avoid DHCP on > net-failover slaves? (Not sure. Asking...) > > 4) Another interesting use-case where the net-failover mechanism is useful is > for handling NIC firmware failures or NIC firmware Live-Upgrade. > In both cases, there is a need to perform a full PCIe reset of the NIC. Which > lose all the NIC eSwitch configuration of the various VFs. In this setup, how does VF keep going? If it doesn't keep going, why is it helpful? > To handle these cases gracefully, one could just hot-unplug all VFs from > guests running on host (which will make all guests now use the virtio-net > netdev which is backed by a netdev that eventually is on top of PF). > Therefore, networking will be restored to guests once the PCIe reset is > completed and the PF is functional again. To re-acceelrate the guests > network, hypervisor can just hot-plug new VFs to guests. > > P.S: > I would very appreciate all this forum help in closing on the pending items > written in (3). Which currently prevents using this net-failover mechanism in > real production use-cases. > > Regards, > -Liran > > > On 17 Mar 2019, at 15:55, Michael S. Tsirkin <m...@redhat.com> wrote: > > > > Hi all, > > I've put up a blog post with a summary of where network > > device failover stands and some open issues. > > Not sure where best to host it, I just put it up on blogspot: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__mstsirkin.blogspot.com_2019_03_virtio-2Dnetwork-2Ddevice-2Dfailover-2Dsupport.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=jd0emHx6EkPSTvO0TytfYmG4rOMQ9htenhrgKprrh9E&s=5EJamlc_g1lZa0Ga7K30E6aWVg3jy8lizhw1aSguo3A&e= > > > > Comments, corrections are welcome! > > > > -- > > MST