[fix linux-pci, remove ethan.zhao (bounces)]

From: Bjorn Helgaas <[email protected]>
Date: Tue, May 21, 2019 at 3:02 PM
To: Himanshu Madhani
Cc: [email protected], Andrew Vasquez, Girish Basrur, Giridhar
Malavali, Myron Stowe, <[email protected]>, Linux Kernel Mailing
List, Quinn Tran

> [+cc Myron, Quinn, linux-pci, linux-kernel]
>
> From: Himanshu Madhani <[email protected]>
> Date: Fri, May 17, 2019 at 5:21 PM
> To: [email protected], [email protected]
> Cc: Andrew Vasquez, Girish Basrur, Giridhar Malavali
>
> > Hi Ethan,
> >
> > Our OEM partners reported to us that VPD access with latest distros were 
> > returning I/O error for them. They indicated this to be issue only with 
> > newer kernels.
> >
> > One of the distro vendor pointed out patch posted by you to be reason for 
> > IO error trying to VPD. The patch looks like blocks access to VPD by 
> > blacklisting ISP.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140b
> >
> > I setup PCIe analyzer to reproduce this in our lab to root cause it and 
> > discovered that after reverting the patch.  I am able to get VPD data okay 
> > with upstream 5.1.0 and I used RHEL8.
> >
> > I also used  "lspci" and "cat" to dump out VPD data and do not see any 
> > issue.
> >
> > # lspci -vvv -s 03:00.0
> > 03:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to 
> > PCIe Adapter (rev 01)
> >                 Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre 
> > Channel to PCIe Adapter
> >                 Physical Slot: 15
> >                 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> >                 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast 
> > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >                 Latency: 0, Cache Line Size: 64 bytes
> >                 Interrupt: pin A routed to IRQ 67
> >                 NUMA node: 0
> >                 Region 0: Memory at fbe05000 (64-bit, prefetchable) 
> > [size=4K]
> >                 Region 2: Memory at fbe02000 (64-bit, prefetchable) 
> > [size=8K]
> >                 Region 4: Memory at fbd00000 (64-bit, prefetchable) 
> > [size=1M]
> >                 Expansion ROM at fb540000 [disabled] [size=256K]
> >                 Capabilities: [44] Power Management version 3
> >                                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> > PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 
> > DScale=0 PME-
> >                 Capabilities: [4c] Express (v2) Endpoint, MSI 00
> >                                 DevCap:                MaxPayload 2048 
> > bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
> >                                                 ExtTag- AttnBtn- AttnInd- 
> > PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> >                                 DevCtl:  Report errors: Correctable+ 
> > Non-Fatal+ Fatal+ Unsupported+
> >                                                 RlxdOrd- ExtTag- PhantFunc- 
> > AuxPwr- NoSnoop+ FLReset-
> >                                                 MaxPayload 256 bytes, 
> > MaxReadReq 4096 bytes
> >                                 DevSta: CorrErr+ UncorrErr- FatalErr- 
> > UnsuppReq+ AuxPwr- TransPend-
> >                                 LnkCap: Port #0, Speed 8GT/s, Width x8, 
> > ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
> >                                                 ClockPM- Surprise- 
> > LLActRep- BwNot- ASPMOptComp+
> >                                 LnkCtl:  ASPM Disabled; RCB 64 bytes 
> > Disabled- CommClk+
> >                                                 ExtSynch- ClockPM- 
> > AutWidDis- BWInt- AutBWInt-
> >                                 LnkSta:  Speed 8GT/s, Width x8, TrErr- 
> > Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >                                 DevCap2: Completion Timeout: Range B, 
> > TimeoutDis+, LTR-, OBFF Not Supported
> >                                                 AtomicOpsCap: 32bit- 64bit- 
> > 128bitCAS-
> >                                 DevCtl2: Completion Timeout: 50us to 50ms, 
> > TimeoutDis-, LTR-, OBFF Disabled
> >                                                 AtomicOpsCtl: ReqEn-
> >                                 LnkCtl2: Target Link Speed: 8GT/s, 
> > EnterCompliance- SpeedDis-
> >                                                 Transmit Margin: Normal 
> > Operating Range, EnterModifiedCompliance- ComplianceSOS-
> >                                                 Compliance De-emphasis: -6dB
> >                                 LnkSta2: Current De-emphasis Level: -6dB, 
> > EqualizationComplete+, EqualizationPhase1+
> >                                                 EqualizationPhase2+, 
> > EqualizationPhase3+, LinkEqualizationRequest-
> >                 Capabilities: [88] Vital Product Data
> >                                 Product Name: QLogic 32Gb 2-port FC to PCIe 
> > Gen3 x8 Adapter
> >                                 Read-only fields:
> >                                                 [PN] Part number: QLE2742
> >                                                 [SN] Serial number: 
> > RFD1706R22611
> >                                                 [EC] Engineering changes: 
> > BK3210408-05 04
> >                                                 [V9] Vendor specific: 010189
> >                                                 [RV] Reserved: checksum 
> > good, 0 byte(s) reserved
> >                                 End
> >                 Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
> >                                 Vector table: BAR=2 offset=00000000
> >                                 PBA: BAR=2 offset=00001000
> >                 Capabilities: [9c] Vendor Specific Information: Len=0c <?>
> >                 Capabilities: [100 v1] Advanced Error Reporting
> >                                 UESta:   DLP- SDES- TLP- FCP- CmpltTO- 
> > CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >                                 UEMsk: DLP- SDES- TLP- FCP- CmpltTO- 
> > CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >                                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- 
> > CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >                                 CESta:   RxErr- BadTLP- BadDLLP- Rollover- 
> > Timeout- NonFatalErr-
> >                                 CEMsk: RxErr- BadTLP- BadDLLP- Rollover- 
> > Timeout- NonFatalErr+
> >                                 AERCap:               First Error Pointer: 
> > 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> >                                                 MultHdrRecCap- 
> > MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >                                 HeaderLog: 00000000 00000000 00000000 
> > 00000000
> >                 Capabilities: [154 v1] Alternative Routing-ID 
> > Interpretation (ARI)
> >                                 ARICap: MFVC- ACS-, Next Function: 1
> >                                 ARICtl:   MFVC- ACS-, Function Group: 0
> >                 Capabilities: [1c0 v1] #19
> >                 Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 
> > Rev=1 Len=014 <?>
> >                 Kernel driver in use: qla2xxx
> >                 Kernel modules: qla2xxx
> >
> > # cat /sys/bus/pci/devices/0000\:03\:00.0/vpd
> > RFD1706R22611ECBK3210408-05 04V9010189RV�x
> >
> > Can you share some more insight into where you encountered issue? I am in 
> > process of reverting this patch from upstream kernel but wanted to reach 
> > out and find out if you still have setup to provide more context.
>
> 0d5370d1d852 ("PCI: Prevent VPD access for QLogic ISP2722") prevented
> a panic while reading VPD, so we can't simply revert it.
>
> Since you don't see a panic while reading VPD from that device, it's
> possible that a QLogic firmware change fixed the VPD format so Linux
> no longer reads the area that caused the problem.  Or possibly your
> system doesn't handle the config read error the same way Ethan's HP
> DL380 does.  Unfortunately we don't have an actual PCIe analyzer trace
> from Ethan's system, so we don't know exactly what happened on PCIe.
>
> I suggest that you capture the entire VPD area and hexdump it, e.g.,
> with "xxd", and look at its structure.  pci_vpd_size() parses it and
> computes the valid size based on a PCI_VPD_STIN_END tag, and
> pci_vpd_read() should not read past that size.
>
> And you *do* have an analyzer trace.  If new QLogic firmware fixed the
> VPD format, the trace should show that Linux read only the valid part
> of VPD, and there should be no errors in the trace.  Then it might
> just be a question of tweaking the quirk so it allows VPD reads if the
> firmware is new enough.
>
> But if the trace does show config reads with errors, then it might be
> that your system just tolerates the errors while the DL380 did not.
> Then we'd have to figure out exactly what the error was and how to
> deal with it so things work on both your system and Ethan's.
>
> Bjorn

Reply via email to