On Wed, Jun 06, 2018 at 02:16:20PM -0700, Stephen Hemminger wrote:
> On Tue, 5 Jun 2018 23:11:37 -0700
> "Samudrala, Sridhar" <sridhar.samudr...@intel.com> wrote:
> 
> > On 6/5/2018 11:00 PM, Stephen Hemminger wrote:
> > > On Tue, 5 Jun 2018 22:39:12 -0700
> > > "Samudrala, Sridhar" <sridhar.samudr...@intel.com> wrote:
> > >  
> > >> On 6/5/2018 8:51 PM, Stephen Hemminger wrote:  
> > >>> On Tue, 5 Jun 2018 16:52:22 -0700
> > >>> "Samudrala, Sridhar" <sridhar.samudr...@intel.com> wrote:
> > >>>     
> > >>>> On 6/5/2018 2:52 PM, Stephen Hemminger wrote:  
> > >>>>> On Tue, 5 Jun 2018 22:38:43 +0300
> > >>>>> "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > >>>>>        
> > >>>>>>> See:
> > >>>>>>>       https://patchwork.ozlabs.org/patch/851711/  
> > >>>>>> Let me try to summarize that:
> > >>>>>>
> > >>>>>>      You wanted to speed up the delayed link up.  You had an idea to
> > >>>>>>      additionally take link up when userspace renames the interface 
> > >>>>>> (standby
> > >>>>>>      one which is also the failover for netvsc).
> > >>>>>>
> > >>>>>>      But userspace might not do any renames, in which case there will
> > >>>>>>      still be the delay, and so this never got applied.
> > >>>>>>
> > >>>>>>      Is this a good summary?
> > >>>>>>
> > >>>>>> Davem said delay should go away completely as it's not robust, and I
> > >>>>>> think I agree.  So I don't think we should make all failover users 
> > >>>>>> use
> > >>>>>> delay. IIUC failover kept a delay option especially for netvsc to
> > >>>>>> minimize the surprise factor. Hopefully we can come up with
> > >>>>>> something more robust and drop that option completely.  
> > >>>>> The timeout was the original solution to how to complete setup after
> > >>>>> userspace has had a chance to rename the device. Unfortunately, the 
> > >>>>> whole network
> > >>>>> device initialization (cooperation with udev and userspace) is a a 
> > >>>>> mess because
> > >>>>> there is no well defined specification, and there are multiple ways 
> > >>>>> userspace
> > >>>>> does this in old and new distributions.  The timeout has its own 
> > >>>>> issues
> > >>>>> (how long, handling errors during that window, what if userspace 
> > >>>>> modifies other
> > >>>>> device state); and open to finding a better solution.
> > >>>>>
> > >>>>> My point was that if name change can not be relied on (or used) by 
> > >>>>> netvsc,
> > >>>>> then we can't allow it for net_failover either.  
> > >>>> I think the push back was with the usage of the delay, not bringing up 
> > >>>> the primary/standby
> > >>>> device in the name change event handler.
> > >>>> Can't netvsc use this mechanism instead of depending on the delay?
> > >>>>
> > >>>>     
> > >>> The patch that was rejected for netvsc was about using name change.
> > >>> Also, you can't depend on name change; you still need a timer. Not all 
> > >>> distributions
> > >>> change name of devices. Or user has blocked that by udev rules.  
> > >> In the net_failover_slave_register() we do a dev_open() and ignore any 
> > >> failure due to
> > >> EBUSY and do another dev_open() in the name change event handler.
> > >> If the name is not expected to change, i would think the dev_open() at 
> > >> the time of
> > >> register will succeed.  
> > > The problem is your first dev_open will bring device up and lockout
> > > udev from changing the network device name.  
> > 
> > I have tried with/without udev and didn't see any issue with the naming of 
> > the primary/standby
> > devices in my testing.
> > 
> > With the 3-netdev failover model, we are only interested in setting the 
> > right name for the failover
> > netdev and that is the reason we do SET_NETDEV_DEV on that netdev. Does it 
> > really matter if udev fails
> > to rename the lower primary/standby netdevs, i don't think it will matter? 
> > The user is not expected
> > to touch the lower netdevs.
> 
> Renaming matters to some users and the udev maintainers. They want the VF to 
> be named enpXXX
> The primary in virtio case needs to be ens3 with some cloud platforms.

Confused. VF can't be a standby, of it's used in a failover it's a
primary, you can't call it both enpXXX amd ens3. Could you describe the
use case in a bit more detail?

> 
> I think you need to get rid of triggering off of the name change.

Worth considering down the road since it's a bit of a hack but it's one
we won't have trouble supporting, unlike the delayed uplink.

> Long term, let's open the discussion about allowing network devices to change 
> name when up?
> Maybe with NETIF_LIVENAME_CHANGE flag?

That's probably the cleanest approach assuming it can be made to work
without races. I suspect we can just live with what we have until then.


-- 
MST

Reply via email to