On 21.02.2024 03:45, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <[email protected]>
> 
> Use the per-domain PCI read/write lock to protect the presence of the
> pci device vpci field. This lock can be used (and in a few cases is used
> right away) so that vpci removal can be performed while holding the lock
> in write mode. Previously such removal could race with vpci_read for
> example.
> 
> When taking both d->pci_lock and pdev->vpci->lock, they should be
> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
> possible deadlock situations.
> 
> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if
> done under the read lock, requires vpci->lock to be acquired on both
> devices being compared, which may produce a deadlock. It is not
> possible to upgrade read lock to write lock in such a case. So, in
> order to prevent the deadlock, use d->pci_lock in write mode instead.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does
> not access multiple pdevs at the same time, can still use a
> combination of the read lock and pdev->vpci->lock.
> 
> 3. Drop const qualifier where the new rwlock is used and this is
> appropriate.
> 
> 4. Do not call process_pending_softirqs with any locks held. For that
> unlock prior the call and re-acquire the locks after. After
> re-acquiring the lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>    possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>    pdev->vpci access and no further access to pdev->vpci is made
> 
> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
> while accessing pdevs in vpci code.
> 
> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
> do not go away. The vPCI functions call several MSI-related functions
> which already have existing non-vPCI callers. Change those MSI-related
> functions to allow using either pcidevs_lock() or d->pci_lock for
> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
> sufficient. Note that this pdev protection mechanism does not protect
> other state or critical sections. These MSI-related functions already
> have other race condition and state protection mechanims (e.g.
> d->event_lock and msixtbl RCU), so we deduce that the use of the global
> pcidevs_lock() is to ensure that pdevs do not go away.
> 
> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
> that pdevs do not go away. The purpose of this wrapper is to aid
> readability and document the intent of the pdev protection mechanism.
> 
> 8. When possible, the existing non-vPCI callers of these MSI-related
> functions haven't been switched to use the newly introduced per-domain
> pci_lock, and will continue to use the global pcidevs_lock(). This is
> done to reduce the risk of the new locking scheme introducing
> regressions. Those users will be adjusted in due time. One exception
> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
> the caller, physdev_map_pirq(): this instance is switched to
> read_lock(&d->pci_lock) right away.
> 
> Suggested-by: Roger Pau Monné <[email protected]>
> Suggested-by: Jan Beulich <[email protected]>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>
> Signed-off-by: Volodymyr Babchuk <[email protected]>
> Signed-off-by: Stewart Hildebrand <[email protected]>

Acked-by: Jan Beulich <[email protected]>
with two small remaining remarks (below) and on the assumption that an
R-b from Roger in particular for the vPCI code is going to turn up
eventually.

> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>      unsigned int i;
>  
> +    /*
> +     * Assert that pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED
> +     * is not suitable here because it may allow either pcidevs_lock() or
> +     * pci_lock to be held, but here we rely on pci_lock being held, not
> +     * pcidevs_lock().
> +     */
> +    ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock));
> +    ASSERT(spin_is_locked(&msix->pdev->vpci->lock));

As to the comment, I think it's not really "may". I also think referral to
...

> @@ -913,13 +918,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>              struct pci_dev *pdev = msix->pdev;
>  
>              spin_unlock(&msix->pdev->vpci->lock);
> +            read_unlock(&pdev->domain->pci_lock);
>              process_pending_softirqs();
> +
> +            if ( !read_trylock(&pdev->domain->pci_lock) )
> +                return -EBUSY;
> +
>              /* NB: we assume that pdev cannot go away for an alive domain. */
>              if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
> +            {
> +                read_unlock(&pdev->domain->pci_lock);
>                  return -EBUSY;
> +            }
> +
>              if ( pdev->vpci->msix != msix )
>              {
>                  spin_unlock(&pdev->vpci->lock);
> +                read_unlock(&pdev->domain->pci_lock);
>                  return -EAGAIN;
>              }
>          }

... this machinery would be quite helpful (and iirc you even had such in an
earlier version).

> @@ -313,17 +316,31 @@ void vpci_dump_msi(void)
>                  {
>                      /*
>                       * On error vpci_msix_arch_print will always return 
> without
> -                     * holding the lock.
> +                     * holding the locks.
>                       */
>                      printk("unable to print all MSI-X entries: %d\n", rc);
> -                    process_pending_softirqs();
> -                    continue;
> +                    goto pdev_done;
>                  }
>              }
>  
> +            /*
> +             * Unlock locks to process pending softirqs. This is
> +             * potentially unsafe, as d->pdev_list can be changed in
> +             * meantime.
> +             */
>              spin_unlock(&pdev->vpci->lock);
> +            read_unlock(&d->pci_lock);
> +        pdev_done:
>              process_pending_softirqs();
> +            if ( !read_trylock(&d->pci_lock) )
> +            {
> +                printk("unable to access other devices for the domain\n");
> +                goto domain_done;
> +            }
>          }
> +        read_unlock(&d->pci_lock);
> +    domain_done:
> +        ;

I think a blank line ahead of this label and perhaps also ahead of
"pdev_done" would be quite nice.

I guess respective adjustments could be done while committing, provided
there's not going to be any other reason for yet another revision.

Jan

Reply via email to