On Tue, 31 Mar 2026 10:22:14 +0200 "Danilo Krummrich" <[email protected]> wrote:
> On Tue Mar 31, 2026 at 10:06 AM CEST, Danilo Krummrich wrote: > > On Mon Mar 30, 2026 at 10:10 PM CEST, Alex Williamson wrote: > >> On Mon, 30 Mar 2026 19:38:41 +0200 > >> "Danilo Krummrich" <[email protected]> wrote: > >> > >>> (Cc: Jason) > >>> > >>> On Tue Mar 24, 2026 at 1:59 AM CET, Danilo Krummrich wrote: > >>> > diff --git a/drivers/vfio/pci/vfio_pci_core.c > >>> > b/drivers/vfio/pci/vfio_pci_core.c > >>> > index d43745fe4c84..460852f79f29 100644 > >>> > --- a/drivers/vfio/pci/vfio_pci_core.c > >>> > +++ b/drivers/vfio/pci/vfio_pci_core.c > >>> > @@ -1987,9 +1987,8 @@ static int vfio_pci_bus_notifier(struct > >>> > notifier_block *nb, > >>> > pdev->is_virtfn && physfn == vdev->pdev) { > >>> > pci_info(vdev->pdev, "Captured SR-IOV VF %s > >>> > driver_override\n", > >>> > pci_name(pdev)); > >>> > - pdev->driver_override = kasprintf(GFP_KERNEL, "%s", > >>> > - vdev->vdev.ops->name); > >>> > - WARN_ON(!pdev->driver_override); > >>> > + WARN_ON(device_set_driver_override(&pdev->dev, > >>> > + > >>> > vdev->vdev.ops->name)); > >>> > >>> Technically, this is a change in behavior. If vdev->vdev.ops->name is > >>> NULL, it > >>> will trigger the WARN_ON(), whereas before it would have just written > >>> "(null)" > >>> into driver_override. > >> > >> It's worse than that. Looking at the implementation in [1], we have: > >> > >> +static inline int device_set_driver_override(struct device *dev, const > >> char *s) > >> +{ > >> + return __device_set_driver_override(dev, s, strlen(s)); > >> +} > >> > >> So if name is NULL, we oops in strlen() before we even hit the -EINVAL > >> and WARN_ON(). > > > > This was changed in v2 [2] and the actual code in-tree is > > > > static inline int device_set_driver_override(struct device *dev, const > > char *s) > > { > > return __device_set_driver_override(dev, s, s ? strlen(s) : 0); > > } > > > > so it does indeed return -EINVAL for a NULL pointer. Ok, good. > >> I don't believe we have any vfio-pci variant drivers where the name is > >> NULL, but kasprintf() handling NULL as "(null)" was a consideration in > >> this design, that even if there is no name the device is sequestered > >> with a driver_override that won't match an actual driver. > >> > >>> I assume that vfio_pci_core drivers are expected to set the name in struct > >>> vfio_device_ops in the first place and this code (silently) relies on this > >>> invariant? > >> > >> We do expect that, but it was previously safe either way to make sure > >> VFs are only bound to the same ops driver or barring that, at least > >> don't perform a standard driver match. The last thing we want to > >> happen automatically is for a user owned PF to create SR-IOV VFs that > >> automatically bind to native kernel drivers. > >> > >>> Alex, Jason: Should we keep this hunk above as is and check for a proper > >>> name in > >>> struct vfio_device_ops in vfio_pci_core_register_device() with a > >>> subsequent > >>> patch? > >> > >> Given the oops, my preference would be to roll it in here. This change > >> is what makes it a requirement that name cannot be NULL, where this was > >> safely handled with kasprintf(). > > > > Again, no oops here. :) > > > > I still think it makes more sense to fail early in > > vfio_pci_core_register_device(), rather than silently accept "(null)" in > > driver_override. It also doesn't seem unreasonable with only the WARN_ON(), > > but > > I can also just add vdev->vdev.ops->name ?: "(null)". > > (Or just skip the call if !vdev->vdev.ops->name, as a user will read "(null)" > from sysfs either way.) Hmm, I suppose they would, but there's a fundamental difference between driver_override being set to "(null)" vs being NULL and only interpreted as "(null)" via show. The former would prevent ID table matching, the latter would not. The former is what was intended here, without realizing both look the same through sysfs and present a difficult debugging challenge. Given no oops for strlen() now and no known in-kernel drivers with a NULL name, I can post a patch that rejects a NULL name in the ops structure in vfio_pci_core_register_device(), avoiding the entire situation. For this, Acked-by: Alex Williamson <[email protected]> Thanks, Alex

