Hi Alex, Maik Broemme <mbroe...@parallels.com> wrote: > Hi Alex, > > Alex Williamson <alex.william...@redhat.com> wrote: > > On Fri, 2014-02-14 at 01:01 +0100, Maik Broemme wrote: > > > Hi Alex, > > > > > > Maik Broemme <mbroe...@parallels.com> wrote: > > > > Hi Alex, > > > > > > > > Alex Williamson <alex.william...@redhat.com> wrote: > > > > > On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote: > > > > > > Interesting is the diff between 1st and 2nd boot, so if I do the > > > > > > lspci > > > > > > prior to the booting. The only difference between 1st start and 2nd > > > > > > start are: > > > > > > > > > > > > --- 001-lspci.290x.before.1st.log 2014-02-07 01:13:41.498827928 > > > > > > +0100 > > > > > > +++ 004-lspci.290x.before.2nd.log 2014-02-07 01:16:50.966611282 > > > > > > +0100 > > > > > > @@ -24,7 +24,7 @@ > > > > > > ClockPM- Surprise- LLActRep- BwNot- > > > > > > LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- > > > > > > CommClk+ > > > > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > > > > > - LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > > > > > > DLActive- BWMgmt- ABWMgmt- > > > > > > + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > > > > > > DLActive- BWMgmt- ABWMgmt- > > > > > > DevCap2: Completion Timeout: Not Supported, > > > > > > TimeoutDis-, LTR-, OBFF Not Supported > > > > > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, > > > > > > LTR-, OBFF Disabled > > > > > > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- > > > > > > SpeedDis- > > > > > > @@ -33,13 +33,13 @@ > > > > > > LnkSta2: Current De-emphasis Level: -3.5dB, > > > > > > EqualizationComplete-, EqualizationPhase1- > > > > > > EqualizationPhase2-, EqualizationPhase3-, > > > > > > LinkEqualizationRequest- > > > > > > Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ > > > > > > - Address: 0000000000000000 Data: 0000 > > > > > > + Address: 00000000fee00000 Data: 0000 > > > > > > Capabilities: [100 v1] Vendor Specific Information: ID=0001 > > > > > > Rev=1 Len=010 <?> > > > > > > Capabilities: [150 v2] Advanced Error Reporting > > > > > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > > > > > > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > > > > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > > > > > > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > > > > > > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- > > > > > > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > > > > > > - CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > > > > > > NonFatalErr- > > > > > > + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > > > > > > NonFatalErr+ > > > > > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > > > > > > NonFatalErr+ > > > > > > AERCap: First Error Pointer: 00, GenCap+ CGenEn- > > > > > > ChkCap+ ChkEn- > > > > > > Capabilities: [270 v1] #19 > > > > > > > > > > > > After that if I do suspend-to-ram / resume trick I have again lspci > > > > > > output from before 1st boot. > > > > > > > > > > The Link Status change after X is stopped seems the most interesting > > > > > to > > > > > me. The MSI change is probably explained by the MSI save/restore of > > > > > the > > > > > device, but should be harmless since MSI is disabled. I'm a bit > > > > > surprised the Correctable Error Status in the AER capability didn't > > > > > get > > > > > cleared. I would have thought that a bus reset would have caused the > > > > > link to retrain back to the original speed/width as well. Let's check > > > > > that we're actually getting a bus reset, try this in addition to the > > > > > previous qemu patch. This just enables debug logging for the bus > > > > > resest > > > > > function. Thanks, > > > > > > > > > > > > > Below are the outputs from 2 boots, VGA, load fglrx and start X. (2nd > > > > time X gets killed and oops happened) > > > > > > > > - 1st boot: > > > > > > > > vfio: vfio_pci_hot_reset(0000:01:00.1) multi > > > > vfio: 0000:01:00.1: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: 0000:01:00.1 group 1 > > > > vfio: 0000:01:00.1 hot reset: Success > > > > vfio: vfio_pci_hot_reset(0000:01:00.1) one > > > > vfio: 0000:01:00.1: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: vfio: found another in-use device 0000:01:00.0 > > > > vfio: vfio_pci_hot_reset(0000:01:00.0) one > > > > vfio: 0000:01:00.0: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: 0000:01:00.1 group 1 > > > > vfio: vfio: found another in-use device 0000:01:00.1 > > > > > > > > - 2nd boot: > > > > > > > > vfio: vfio_pci_hot_reset(0000:01:00.1) multi > > > > vfio: 0000:01:00.1: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: 0000:01:00.1 group 1 > > > > vfio: 0000:01:00.1 hot reset: Success > > > > vfio: vfio_pci_hot_reset(0000:01:00.1) one > > > > vfio: 0000:01:00.1: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: vfio: found another in-use device 0000:01:00.0 > > > > vfio: vfio_pci_hot_reset(0000:01:00.0) one > > > > vfio: 0000:01:00.0: hot reset dependent devices: > > > > vfio: 0000:01:00.0 group 1 > > > > vfio: 0000:01:00.1 group 1 > > > > vfio: vfio: found another in-use device 0000:01:00.1 > > > > > > > > > > Did you had already a chance to look into it or anything else I can help > > > with? > > > > According to the log we're doing the bus reset on both the first and 2nd > > boot (it's expected that only the "multi" call gets to success). I'm > > surprised then that the link doesn't retrain back to the original width. > > You could try forcing the link to retrain. Look at the root port > > upstream from the GPU, lspci -t is handy for this. Run lspci on the > > root port to get the PCI express capability offset, then use setpci to > > set the link retrain bit. For example: > > > > # lspci -tv | grep NVIDIA > > +-07.0-[03]--+-00.0 NVIDIA Corporation GK106GL [Quadro K4000] > > | \-00.1 NVIDIA Corporation GK106 HDMI Audio > > Controller > > > > (upstream root port is 00:07.0) > > > > # lspci -v -s 7.0 | grep Capabilities > > Capabilities: [40] Subsystem: Intel Corporation 5520/5500/X58 I/O Hub > > PCI Express Root Port 7 > > Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- > > Capabilities: [90] Express Root Port (Slot+), MSI 00 > > Capabilities: [e0] Power Management version 3 > > Capabilities: [100] Advanced Error Reporting > > Capabilities: [150] Access Control Services > > Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c > > <?> > > > > (PCI express capability is offset 0x90, Link Control is 0x10 off that) > > > > # setpci -s 7.0 a0.w > > 0040 > > > > (retrain is bit 5, 0x20, OR'd with read value is 0x60) > > > > # setpci -s 7.0 a0.w=60 > > > > # lspci... did it work? > > > > Try doing that after the first boot to see if you can get back to a x16 > > link. If that works, we may need to add something in the kernel to do > > it automatically around a bus reset. Thanks, > > > > Well this doesn't help either and it looks like VFIO reset is setting it > already back to original width. For example: > > +-02.0-[01]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Hawaii > XT [Radeon HD 8970] > | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Device > aac8 > > Before 1st run: > > root@homer:~# lspci -vvv -s 00:02.0 | grep LnkSta: > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive+ BWMgmt- ABWMgmt- > root@homer:~# lspci -vvv -s 01:00.0 | grep LnkSta: > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > > After power down of VM: > > root@homer:~# lspci -vvv -s 00:02.0 | grep LnkSta: > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive+ BWMgmt- ABWMgmt+ > root@homer:~# lspci -vvv -s 01:00.0 | grep LnkSta: > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > > After 2nd start once VFIO did reset: > > root@homer:~# lspci -vvv -s 00:02.0 | grep LnkSta: > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive+ BWMgmt- ABWMgmt+ > root@homer:~# lspci -vvv -s 01:00.0 | grep LnkSta: > LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > > The only difference on bus I see here is ABWMgmt- vs ABWMgmt+ but it > shouldn't be relevant here as it the same if I unload fglrx module > before shutdown the VM which is the only case where I can run multiple > VM reboot cycles. > > So the only difference on bus is the following: > > -60: 10 08 00 00 02 cd 31 00 40 00 02 b1 80 25 14 00 > +60: 10 08 00 00 02 cd 31 00 40 00 11 b0 80 25 14 00 > > 6a (before 02, after 11) > 6b (before b1, after b0) > > But I cannot write these parameters using setpci. My PCI express capability > is offset 0x58 + 0x10 for link control which is already set back to 40 > > root@homer:~# lspci -vvv -s 00:02.0 | grep Capa > Capabilities: [50] Power Management version 3 > Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 > Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit- > Capabilities: [b0] Subsystem: Gigabyte Technology Co., Ltd Device 5000 > Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+ > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 > Len=010 <?> > Capabilities: [190 v1] Access Control Services >
Wouldn't it be a possible solution to do a D0 -> D3 -> D0 transition for devices which doesn't support FLR? The setpci way doesn't help me at all > > Alex > > > > > > > diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c > > > > > index 8db182f..7fec259 100644 > > > > > --- a/hw/misc/vfio.c > > > > > +++ b/hw/misc/vfio.c > > > > > @@ -2927,6 +2927,10 @@ static bool > > > > > vfio_pci_host_match(PCIHostDeviceAddress *hos > > > > > host1->slot == host2->slot && host1->function == > > > > > host2->function); > > > > > } > > > > > > > > > > +#undef DPRINTF > > > > > +#define DPRINTF(fmt, ...) \ > > > > > + do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0) > > > > > + > > > > > static int vfio_pci_hot_reset(VFIODevice *vdev, bool single) > > > > > { > > > > > VFIOGroup *group; > > > > > @@ -3104,6 +3108,15 @@ out_single: > > > > > return ret; > > > > > } > > > > > > > > > > +#undef DPRINTF > > > > > +#ifdef DEBUG_VFIO > > > > > +#define DPRINTF(fmt, ...) \ > > > > > + do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0) > > > > > +#else > > > > > +#define DPRINTF(fmt, ...) \ > > > > > + do { } while (0) > > > > > +#endif > > > > > + > > > > > /* > > > > > * We want to differentiate hot reset of mulitple in-use devices vs > > > > > hot reset > > > > > * of a single in-use device. VFIO_DEVICE_RESET will already handle > > > > > the case > > > > > > > > > > > > > > > > > > --Maik > > > > > > > > > > --Maik > > > > > > > > --Maik > --Maik