On Tue, Jul 01, 2025 at 03:48:41PM +0800, Zigit Zo wrote:
> On 6/30/25 10:54 PM, Michael S. Tsirkin wrote:
> > On Mon, Jun 30, 2025 at 10:50:55AM -0400, Michael S. Tsirkin wrote:
> >> On Mon, Jun 30, 2025 at 05:51:09PM +0800, Zigit Zo wrote:
> >>> This bug happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while
> >>> the virtio-net driver is still probing with rtnl_lock() hold, this will
> >>> cause a recursive mutex in netdev_notify_peers().
> >>>
> >>> Fix it by skip acking the annouce in virtnet_config_changed_work() when
> >>> probing. The annouce will still get done when ndo_open() enables the
> >>> virtio_config_driver_enable().
> >>
> >> I am not so sure it will be - while driver is not loaded, device does
> >> not have to send interrupts, and there's no rule I'm aware of that says
> >> we'll get one after DRIVER_OK.
> 
> Yep, at first we're thinking that when the VIRTIO_NET_S_ANNOUNCE flag set,
> we can always assure an interrupt has fired by VMM, to notify the driver
> to do the announcement.
> 
> But later we realized that the S_ANNOUNCE flag can be sent before the
> driver's probing, and for QEMU seems to set the status flag regardless of
> whether driver is ready, so the problem you're talking still may happens.
> >> How about, we instead just schedule the work to do it later?I'm not sure 
> >> if scheduling the work later will break df28de7b0050, the work
> was being scheduled before that commit, and we have no much idea of why that
> commit removes the schedule_work, we just keep it for safe...

Well managing async things is always tricky. Direct call is safer.
If you reintroduce it, you need to audit all call paths for safely.


> Then, for plan A, we change the check_announce to schedule_announce, and if
> that's true, we do another schedule_work to call virtnet_config_changed_work
> again to finish the announcement, like
> 
>       if (v & VIRTIO_NET_S_ANNOUNCE) {
>               if (unlikely(schedule_announce))
>                       schedule_work(&vi->config_work);
>               else {
>                       netdev_notify_peers(vi->dev);
>                       virtnet_ack_link_announce(vi);
>               }
>       }
> 
> >>
> >> Also, there is another bug here.
> >> If ndo_open did not run, we actually should not send any announcements.
> >>
> >> Do we care if carrier on is set on probe or on open?
> >> If not, let's just defer this to ndo_open?
> > 
> > Hmm yes I think we do, device is visible to userspace is it not?
> > 
> > Hmm.  We can keep the announce bit set in vi->status and on open, check
> > it and then schedule a work to do the announcement.
> 
> Okay, so there's a plan B, we save the bit and re-check it in ndo_open, like
> 
>       /* __virtnet_config_changed_work() */
>       if (v & VIRTIO_NET_S_ANNOUNCE) {
>               vi->status |= VIRTIO_NET_S_ANNOUNCE;
>               if (unlikely(!check_announce))
>                       goto check_link;
> 
>               netdev_notify_peers(vi->dev);
>               virtnet_ack_link_announce(vi);
>               vi->status &= ~VIRTIO_NET_S_ANNOUNCE;
>       }
> 
>       /* virtnet_open() */
>       if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
>               if (vi->status & VIRTIO_NET_S_LINK_UP)
>                       netif_carrier_on(vi->dev);
>               if
>               if (vi->status & VIRTIO_NET_S_ANNOUNCE)
>                       schedule_work(&vi->config_work);
>               virtio_config_driver_enable(vi->vdev);
>       }
> 
> This is a dirty demo, any ideas are welcomed :)
> 
> (I think in virtnet_open() we can make the S_LINK_UP being scheduled as well?)


Reply via email to