On Mon, Jul 28, 2025 at 12:09:34PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Akihiko, Michael,
>
> On 27/7/25 08:50, Akihiko Odaki wrote:
> > Fix issues in PCIe SR-IOV configuration register handling that caused
> > inconsistent internal state due to improper write mask handling and
> > incorrect migration behavior.
> >
> > Two main problems were identified:
> >
> > 1. VF Enable bit write mask handling:
> > pcie_sriov_config_write() incorrectly assumed that its val parameter
> > was already masked, causing it to ignore the actual write mask.
> > This led to the VF Enable bit being processed even when masked,
> > resulting in incorrect VF registration/unregistration. It is
> > identified as CVE-2025-54567.
> >
> > 2. Migration state inconsistency:
> > pcie_sriov_pf_post_load() unconditionally called register_vfs()
> > regardless of the VF Enable bit state, creating inconsistent
> > internal state when VFs should not be enabled. Additionally,
> > it failed to properly update the NumVFs write mask based on
> > the current configuration. It is identified as CVE-2025-54566.
> >
> > Root cause analysis revealed that both functions relied on incorrect
> > special-case assumptions instead of properly reading and consuming
> > the actual configuration values. This change introduces a unified
> > consume_config() function that reads actual configuration values and
> > synchronize the internal state without special-case assumptions.
> >
> > The solution only adds register read overhead in non-hot-path code
> > while ensuring correct SR-IOV state management across configuration
> > writes and migration scenarios.
> >
> > Fixes: 5e7dd17e4348 ("pcie_sriov: Remove num_vfs from PCIESriovPF")
> > Fixes: f9efcd47110d ("pcie_sriov: Register VFs after migration")
> > Fixes: CVE-2025-54566
> > Fixes: CVE-2025-54567
> > Cc: [email protected]
> > Reported-by: Corentin BAYET <[email protected]>
> > Signed-off-by: Akihiko Odaki <[email protected]>
> > ---
> > Changes in v2:
> > - Changed to perform the VFEnable write mask update only when the bit is
> > cleared. It clarifies the intention is to prevent setting the bit
> > (i.e., the bit is currently cleared) when the NumVF holds an invalid
> > value. The code execution when the bit is set will be also a bit
> > shorter.
> > - Added references to the relevant CVEs.
> > - Link to v1:
> > https://lore.kernel.org/qemu-devel/[email protected]
> > ---
> > hw/pci/pcie_sriov.c | 42 +++++++++++++++++++++++-------------------
> > 1 file changed, 23 insertions(+), 19 deletions(-)
> >
> > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> > index
> > 3ad18744f4a8ed2b35144fafcdc8e7e00fec3672..8a4bf0d6f7c0c6e9ec30df2e9bc55967e48cf6c3
> > 100644
> > --- a/hw/pci/pcie_sriov.c
> > +++ b/hw/pci/pcie_sriov.c
> > @@ -64,6 +64,27 @@ static void unregister_vfs(PCIDevice *dev)
> > pci_set_word(dev->wmask + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF,
> > 0xffff);
> > }
> > +static void consume_config(PCIDevice *dev)
> > +{
> > + uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> > +
> > + if (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) {
> > + register_vfs(dev);
> > + } else {
> > + uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> > + uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > + uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > +
> > + unregister_vfs(dev);
> > +
> > + if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > + wmask_val |= PCI_SRIOV_CTRL_VFE;
> > + }
> > +
> > + pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > + }
> > +}
> > +
> > static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset,
> > uint16_t vf_dev_id, uint16_t
> > init_vfs,
> > uint16_t total_vfs, uint16_t
> > vf_offset,
> > @@ -416,30 +437,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t
> > address,
> > trace_sriov_config_write(dev->name, PCI_SLOT(dev->devfn),
> > PCI_FUNC(dev->devfn), off, val, len);
> > - if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > - if (val & PCI_SRIOV_CTRL_VFE) {
> > - register_vfs(dev);
> > - } else {
> > - unregister_vfs(dev);
> > - }
> > - } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
> > - uint8_t *cfg = dev->config + sriov_cap;
> > - uint8_t *wmask = dev->wmask + sriov_cap;
> > - uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
> > - uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
> > -
> > - if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
> > - wmask_val |= PCI_SRIOV_CTRL_VFE;
> > - }
> > -
> > - pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
> > - }
> > + consume_config(dev);
>
> As usual, this would be simpler to review extracting consume_config() in
> a preliminary patch, then the real fix.
>
> > }
> > void pcie_sriov_pf_post_load(PCIDevice *dev)
> > {
> > if (dev->exp.sriov_cap) {
> > - register_vfs(dev);
> > + consume_config(dev);
> > }
> > }
>
> Michael, do we want this for the 10.1 release?
>
> Regards,
>
> Phil.
Think so, I'm testing a pull with this included.