** Changed in: linux-azure (Ubuntu Focal) Status: In Progress => Fix Committed
** Changed in: linux-azure-5.4 (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1952621 Title: Bionic/linux-azure: Call trace on Ubuntu 18.04 VM with Standard NV24 Status in linux-azure package in Ubuntu: Invalid Status in linux-azure-5.4 package in Ubuntu: New Status in linux-azure source package in Bionic: Invalid Status in linux-azure-5.4 source package in Bionic: Fix Committed Status in linux-azure source package in Focal: Fix Committed Status in linux-azure-5.4 source package in Focal: Invalid Bug description: SRU Justification [Impact] During large scale deployment testing, we found below call trace when provisioning Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed instance 10 times and encountered once. It looks like a race condition when probe device, but finally all devices can be probed. [ 4.938162] sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0003-0000-3130-444531334632/pci0003:00/0003:00:00.0/config' [ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray [ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure #64~18.04.1-Ubuntu [ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 [ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 4.955812] Workqueue: hv_pri_chan vmbus_add_channel_work [ 4.955812] Call Trace: [ 4.955812] dump_stack+0x57/0x6d [ 4.955812] sysfs_warn_dup+0x5b/0x70 [ 4.955812] sysfs_add_file_mode_ns+0x158/0x180 [ 4.955812] sysfs_create_bin_file+0x64/0x90 [ 4.955812] pci_create_sysfs_dev_files+0x72/0x270 [ 4.955812] pci_bus_add_device+0x30/0x80 [ 4.955812] pci_bus_add_devices+0x31/0x70 [ 4.955812] hv_pci_probe+0x48c/0x650 [ 4.955812] vmbus_probe+0x3e/0x90 [ 4.955812] really_probe+0xf5/0x440 [ 4.955812] driver_probe_device+0x11b/0x130 [ 4.955812] __device_attach_driver+0x7b/0xe0 [ 4.955812] ? driver_allows_async_probing+0x60/0x60 [ 4.955812] bus_for_each_drv+0x6e/0xb0 [ 4.955812] __device_attach+0xe4/0x160 [ 4.955812] device_initial_probe+0x13/0x20 [ 4.955812] bus_probe_device+0x92/0xa0 [ 4.955812] device_add+0x402/0x690 [ 4.955812] device_register+0x1a/0x20 [ 4.955812] vmbus_device_register+0x5e/0xf0 [ 4.955812] vmbus_add_channel_work+0x2c4/0x640 [ 4.955812] process_one_work+0x209/0x400 [ 4.955812] worker_thread+0x34/0x400 [ 4.955812] kthread+0x121/0x140 [ 4.955812] ? process_one_work+0x400/0x400 [ 4.955812] ? kthread_park+0x90/0x90 [ 4.955812] ret_from_fork+0x35/0x40 [ 5.043612] hv_pci 47505500-0004-0001-3130-444531334632: PCI VMBus probing: Using version 0x10002 [ 5.260563] hv_pci 47505500-0004-0001-3130-444531334632: PCI host bridge to bus 0004:00 Dexuan did some research and it looks like this is a longstanding race condition bug in the generic PCI subsystem (due to the timing, there can be more than 1 place where the PCI code tries to create the same ‘config’ sysfs file): https://patchwork.kernel.org/project/linux-pci/patch/20200716110423.xtfyb3n6tn5ixedh@pali/#23669641 The bug was reported on 7/16/2020, and the last reply was on 6/25/2021. It looks like this has not been fixed after 1+ year… Business Impact [Test Case] Repeated deployment on a Standard_NV24 instance. MS reported the reproduction rate is 3/551 before the patch, and 0/838 with the patch. [Where things could go wrong] Deployments could fail for other reasons. [Other info] SF: #00321027 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1952621/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp