On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote:
> Hi,
> 
> I have been trying to debug a problem where a vif with the backend in a 
> driver domain is added to dom0.  Intermittently the hotplug script is 
> not invoked by libxl (running as xl devd) in the driver domain.  By 
> enabling some debug for the driver domain kernel and libxl I have these 
> messages:
> 
> driver domain kernel (Ubuntu 5.4.0-92-generic):
> 
> [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully 
> created xenvif
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Initialising
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> InitWait
> [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: 
> /local/domain/0/device/vif/0 -> Connected
> [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready
> [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: 
> backend/vif/0/0 -> Connected
> 
> xl devd (Xen 4.14.3):
> 
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event 
> epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: 
> libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, 
> parent 0x5633ac567f90
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: 
> watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: 
> event epath=/local/domain/2/backend/vif/0/0/state
> 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting 
> state 4
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_aoutils.c:88:xswait_timeout_callback: backend 
> /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait 
> timeout (path=/local/domain/2/backend/vif/0/0/state)
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 
> wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: 
> backend /local/domain/2/backend/vif/0/0/state wanted state 2  timed out
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: error: 
> libxl_device.c:1105:device_backend_callback: unable to add device with path 
> /local/domain/2/backend/vif/0/0
> 2022-01-13 01:39:51 UTC libxl: debug: 
> libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: 
> deregister unregistered
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: 
> device /local/domain/2/backend/vif/0/0 add failed
> 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: 
> ao 0x5633ac568f30: destroy
> 
> the xenstore content for the backend:
> 
> # xenstore-ls /local/domain/2/backend/vif/0
> 0 = ""
>  frontend = "/local/domain/0/device/vif/0"
>  frontend-id = "0"
>  online = "1"
>  state = "4"
>  script = "/etc/xen/scripts/vif-zynstra"
>  vifname = "dom0.0"
>  mac = "00:16:3e:6c:de:82"
>  bridge = "cluster"
>  handle = "0"
>  type = "vif"
>  feature-sg = "1"
>  feature-gso-tcpv4 = "1"
>  feature-gso-tcpv6 = "1"
>  feature-ipv6-csum-offload = "1"
>  feature-rx-copy = "1"
>  feature-rx-flip = "0"
>  feature-multicast-control = "1"
>  feature-dynamic-multicast-control = "1"
>  feature-split-event-channels = "1"
>  multi-queue-max-queues = "2"
>  feature-ctrl-ring = "1"
>  hotplug-status = "connected"
> 
> My guess is that the libxl callback is started waiting for the backend 
> state key to be set to XenbusStateInitWait (2) but the frontend in dom0 
> has already triggered the backend to transition to XenbusStateConnected 
> (4) and therefore it does not successfully complete.

I think your analysis is correct. The question is who sets
hotplug-status to "connected" in xenstore if the hotplug script in the
driver domain hasn't been launched by `xl devd`.

It would seem to me some other entity is launching the hotplug
scripts, do you have some udev rules in the driver domain that do that
and race with `xl devd` maybe?

Regards, Roger.

Reply via email to