On Mon, Oct 24, 2022 at 01:04:01PM +0200, Jan Beulich wrote:
> On 20.10.2022 11:46, Roger Pau Monne wrote:
> > It's possible for a device to be assigned to a domain but have no
> > vpci structure if vpci_process_pending() failed and called
> > vpci_remove_device() as a result. The unconditional accesses done by
> > vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
> > then trigger a NULL pointer dereference.
> >
> > Add checks for pdev->vpci presence in the affected functions.
> >
> > Fixes: 9c244fdef7 ('vpci: add header handlers')
> > Signed-off-by: Roger Pau Monné <[email protected]>
>
> Reviewed-by: Jan Beulich <[email protected]>
>
> I wonder though whether these changes are enough. Is
> vpci_process_pending() immune to a pdev losing its ->vpci?
I think this is safe so far because the only place where
vpci_remove_device() gets called that doesn't also deassign the device
from the domain is vpci_process_pending(), and in that error path it
also clears any pending work. Since the device no longer has ->vpci
handlers no further calls to vpci_process_pending() can happen.
> Furthermore msix_find() iterates over d->arch.hvm.msix_tables, which
> looks to only ever be added to. Doesn't this list need pruning by
> vpci_remove_device()? I've noticed this only because of looking at
> derefs of ->vpci in msix.c - I don't think I can easily see that all
> of those derefs are once again immune to a pdev losing its ->vpci.
I think you are correct, we are missing a
list_del(&pdev->vpci->msix->next) in vpci_remove_device(). We will
however have locking issues with this, as msix_find() doesn't take any
locks, neither do it's callers. I guess this will be fixed as part of
the lager add vPCI locking series. Will add another patch to the
series with the MSIX table removal.
Thanks, Roger.