Hi Jan,

Jan Beulich <[email protected]> writes:

> On 21.04.2023 16:13, Volodymyr Babchuk wrote:
>> 
>> Hi Roger,
>> 
>> Roger Pau Monné <[email protected]> writes:
>> 
>>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote:
>>>>
>>>> Hello Roger,
>>>>
>>>> Roger Pau Monné <[email protected]> writes:
>>>>
>>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote:
>>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote:
>>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote:
>>>>>>>> Above I have proposed another view on this. I hope, it will work for
>>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be 
>>>>>>>> left
>>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that
>>>>>>>> owners of those refcounts will not try to access the physical PCI
>>>>>>>> device if pci_remove_device() is already finished.
>>>>>>>
>>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an
>>>>>>> opinion.  I will like to also hear Jans opinion, since he is the
>>>>>>> maintainer.
>>>>>>
>>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". 
>>>>>> Whoever
>>>>>> holds a ref is entitled to access the device. As stated before, I see 
>>>>>> only
>>>>>> two ways of getting things consistent: Either pci_remove_device() is
>>>>>> invoked upon dropping of the last ref,
>>>>>
>>>>> With this approach, what would be the implementation of
>>>>> PHYSDEVOP_manage_pci_remove?  Would it just check whether the pdev
>>>>> exist and either return 0 or -EBUSY?
>>>>>
>>>>
>>>> Okay, I am preparing patches with the behavior you proposed. To test it,
>>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove
>>>> returned -EBUSY, which propagated to the linux driver. Problem is that
>>>> Linux driver can't do anything with this. It just displayed an error
>>>> message and removed device anyways. This is because Linux sends
>>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no
>>>> way to prevent the device removal. So, admin is not capable to try this
>>>> again.
>>>
>>> Ideally Linux won't remove the device, and then the admin would get to
>>> retry.  Maybe the way the Linux hook is placed is not the best one?
>>> The hypervisor should be authoritative on whether a device can be
>>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an
>>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux
>>> to continue.
>> 
>> Yes, it would be ideally, but Linux driver/device model is written in a
>> such way, that PCI subsystem tracks all the PCI device usage, so it can
>> be certain that it can remove the device. Thus, functions in the device
>> removal path either return void or 0. Of course, kernel does not know that
>> hypervisor has additional uses for the device, so there is no mechanisms
>> to prevent removal.
>
> Could pciback obtain a reference on behalf of the hypervisor, dropping it
> when device removal is requested (i.e. much closer to the start of that
> operation), and only if it finds that no guests use the device anymore?

Yes, it can, it this indeed will hold a reference to a pci device for a
time, but there are some consideration that made this approach not
feasible.

Basically, when an user writes to /sys/bus/pci/SBDF/remove, the
following happens:

1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the
operation anymore

[unimportant things]

N. pci_stop_dev() function is called. This function unloads a device
driver. Any good behaving driver should drop all additional references
to a device at this point.

[more unimportant things]

M. PCI subsystem drops own reference to a generic device object

So, as you can see, admin can't restart a "failed" attempt to remove a
PCI device.

On other hand, remove() function can sleep. This allows us to pause
removal process a bit and check if hypervisor had finished removing a
PCI device on its side. But, as you pointed out, this can take weeks...

-- 
WBR, Volodymyr

Reply via email to