On 24.04.2023 16:15, Volodymyr Babchuk wrote:
> 
> Hi Jan,
> 
> Jan Beulich <[email protected]> writes:
> 
>> On 21.04.2023 16:13, Volodymyr Babchuk wrote:
>>>
>>> Hi Roger,
>>>
>>> Roger Pau Monné <[email protected]> writes:
>>>
>>>> On Fri, Apr 21, 2023 at 11:00:23AM +0000, Volodymyr Babchuk wrote:
>>>>>
>>>>> Hello Roger,
>>>>>
>>>>> Roger Pau Monné <[email protected]> writes:
>>>>>
>>>>>> On Mon, Apr 17, 2023 at 12:34:31PM +0200, Jan Beulich wrote:
>>>>>>> On 17.04.2023 12:17, Roger Pau Monné wrote:
>>>>>>>> On Fri, Apr 14, 2023 at 01:30:39AM +0000, Volodymyr Babchuk wrote:
>>>>>>>>> Above I have proposed another view on this. I hope, it will work for
>>>>>>>>> you. Just to reiterate, idea is to allow "harmless" refcounts to be 
>>>>>>>>> left
>>>>>>>>> after returning from pci_remove_device(). By "harmless" I mean that
>>>>>>>>> owners of those refcounts will not try to access the physical PCI
>>>>>>>>> device if pci_remove_device() is already finished.
>>>>>>>>
>>>>>>>> I'm not strictly a maintainer of this piece code, albeit I have an
>>>>>>>> opinion.  I will like to also hear Jans opinion, since he is the
>>>>>>>> maintainer.
>>>>>>>
>>>>>>> I'm afraid I can't really appreciate the term "harmless refcounts". 
>>>>>>> Whoever
>>>>>>> holds a ref is entitled to access the device. As stated before, I see 
>>>>>>> only
>>>>>>> two ways of getting things consistent: Either pci_remove_device() is
>>>>>>> invoked upon dropping of the last ref,
>>>>>>
>>>>>> With this approach, what would be the implementation of
>>>>>> PHYSDEVOP_manage_pci_remove?  Would it just check whether the pdev
>>>>>> exist and either return 0 or -EBUSY?
>>>>>>
>>>>>
>>>>> Okay, I am preparing patches with the behavior you proposed. To test it,
>>>>> I artificially set refcount to 2 and indeed PHYSDEVOP_manage_pci_remove
>>>>> returned -EBUSY, which propagated to the linux driver. Problem is that
>>>>> Linux driver can't do anything with this. It just displayed an error
>>>>> message and removed device anyways. This is because Linux sends
>>>>> PHYSDEVOP_manage_pci_remove in device_remove() call path and there is no
>>>>> way to prevent the device removal. So, admin is not capable to try this
>>>>> again.
>>>>
>>>> Ideally Linux won't remove the device, and then the admin would get to
>>>> retry.  Maybe the way the Linux hook is placed is not the best one?
>>>> The hypervisor should be authoritative on whether a device can be
>>>> removed or not, and hence PHYSDEVOP_manage_pci_remove returning an
>>>> error (EBUSY or otherwise) shouldn't allow the device unplug in Linux
>>>> to continue.
>>>
>>> Yes, it would be ideally, but Linux driver/device model is written in a
>>> such way, that PCI subsystem tracks all the PCI device usage, so it can
>>> be certain that it can remove the device. Thus, functions in the device
>>> removal path either return void or 0. Of course, kernel does not know that
>>> hypervisor has additional uses for the device, so there is no mechanisms
>>> to prevent removal.
>>
>> Could pciback obtain a reference on behalf of the hypervisor, dropping it
>> when device removal is requested (i.e. much closer to the start of that
>> operation), and only if it finds that no guests use the device anymore?
> 
> Yes, it can, it this indeed will hold a reference to a pci device for a
> time, but there are some consideration that made this approach not
> feasible.
> 
> Basically, when an user writes to /sys/bus/pci/SBDF/remove, the
> following happens:
> 
> 1. /sys/bus/pci/SBFD/remove entry is removed - we can't retry the
> operation anymore

Looking at the comment ahead of pci_stop_and_remove_bus_device(),
isn't this too late already. The text there says "has been removed",
not e.g. "is about to be removed". (Of course chances are that it is
the comment which is wrong; I know too little about Linux'es hot-
unplug machinery.)

Jan

> [unimportant things]
> 
> N. pci_stop_dev() function is called. This function unloads a device
> driver. Any good behaving driver should drop all additional references
> to a device at this point.
> 
> [more unimportant things]
> 
> M. PCI subsystem drops own reference to a generic device object
> 
> So, as you can see, admin can't restart a "failed" attempt to remove a
> PCI device.
> 
> On other hand, remove() function can sleep. This allows us to pause
> removal process a bit and check if hypervisor had finished removing a
> PCI device on its side. But, as you pointed out, this can take weeks...
> 


Reply via email to