Someone can take a look for this issue, thanks :)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2044810

Title:
  VF cannot creation with large CPU core systems when RDMA enabled with
  intel ice driver

Status in linux-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Issue Environment:
  ==================

  root@npx:~# cat /etc/os-release
  PRETTY_NAME="Ubuntu 22.04.3 LTS"
  NAME="Ubuntu"
  VERSION_ID="22.04"
  VERSION="22.04.3 LTS (Jammy Jellyfish)"
  VERSION_CODENAME=jammy
  ID=ubuntu
  ID_LIKE=debian
  HOME_URL="https://www.ubuntu.com/";
  SUPPORT_URL="https://help.ubuntu.com/";
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
  
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
  UBUNTU_CODENAME=jammy

  
  root@npx:~# uname -r
  5.15.0-88-generic

  
  root@npx:~# lscpu | head -n 5
  Architecture:                       x86_64
  CPU op-mode(s):                     32-bit, 64-bit
  Address sizes:                      52 bits physical, 57 bits virtual
  Byte Order:                         Little Endian
  CPU(s):                             256

  
  root@npx:~# ethtool -i ens2f0
  driver: ice
  version: 5.15.0-88-generic
  firmware-version: 4.40 0x8001c7d5 1.3534.0
  expansion-rom-version:
  bus-info: 0000:16:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  
  root@npx:~# lspci -s 16:00.0 -vvv
  16:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for 
SFP (rev 02)
          Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-4
          Physical Slot: 2
          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx+
          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
          Latency: 0, Cache Line Size: 32 bytes
          Interrupt: pin A routed to IRQ 16
          NUMA node: 0
          IOMMU group: 19
          Region 0: Memory at 201ffa000000 (64-bit, prefetchable) [size=32M]
          Region 3: Memory at 201ffe030000 (64-bit, prefetchable) [size=64K]
          Expansion ROM at 95800000 [disabled] [size=1M]
          Capabilities: [40] Power Management version 3
                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
          Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                  Address: 0000000000000000  Data: 0000
                  Masking: 00000000  Pending: 00000000
          Capabilities: [70] MSI-X: Enable+ Count=512 Masked-
                  Vector table: BAR=3 offset=00000000
                  PBA: BAR=3 offset=00008000
          Capabilities: [a0] Express (v2) Endpoint, MSI 00
                  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s 
<512ns, L1 <64us
                          ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
SlotPowerLimit 0.000W
                  DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
                          RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
                          MaxPayload 512 bytes, MaxReadReq 4096 bytes
                  DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ 
TransPend-
                  LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                          ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                  LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                          ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                  LnkSta: Speed 16GT/s (ok), Width x16 (ok)
                          TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                  DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- 
LTR-
                           10BitTagComp+ 10BitTagReq- OBFF Not Supported, 
ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                           EmergencyPowerReduction Not Supported, 
EmergencyPowerReductionInit-
                           FRS- TPHComp- ExtTPHComp-
                           AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 
OBFF Disabled,
                           AtomicOpsCtl: ReqEn-
                  LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- 
Retimer+ 2Retimers+ DRS-
                  LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                           Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                           Compliance De-emphasis: -6dB
                  LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete+ EqualizationPhase1+
                           EqualizationPhase2+ EqualizationPhase3+ 
LinkEqualizationRequest-
                           Retimer- 2Retimers- CrosslinkRes: unsupported
          Capabilities: [e0] Vital Product Data
                  Product Name: Intel(R) Ethernet Network Adapter E810-XXVDA4
                  Read-only fields:
                          [V1] Vendor specific: Intel(R) Ethernet Network 
Adapter E810-XXVDA4
                          [PN] Part number: ~PBA-----~
                          [SN] Serial number: ~MAC-------~
                          [V2] Vendor specific: ~WY~
                          [RV] Reserved: checksum good, 0 byte(s) reserved
                  End
          Capabilities: [100 v2] Advanced Error Reporting
                  UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                  UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC+ UnsupReq+ ACSViol-
                  UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                  CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr+
                  CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr-
                  AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- 
ECRCChkCap+ ECRCChkEn-
                          MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                  HeaderLog: 00000000 00000000 00000000 00000000
          Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                  ARICap: MFVC- ACS-, Next Function: 1
                  ARICtl: MFVC- ACS-, Function Group: 0
          Capabilities: [150 v1] Device Serial Number 50-7c-6f-ff-ff-3b-78-30
          Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                  IOVCap: Migration-, Interrupt Message Number: 000
                  IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
                  IOVSta: Migration-
                  Initial VFs: 64, Total VFs: 64, Number of VFs: 4, Function 
Dependency Link: 00
                  VF offset: 8, stride: 1, Device ID: 1889
                  Supported Page Size: 00000553, System Page Size: 00000001
                  Region 0: Memory at 0000201ffd800000 (64-bit, prefetchable)
                  Region 3: Memory at 0000201ffe340000 (64-bit, prefetchable)
                  VF Migration: offset: 00000000, BIR: 0
          Capabilities: [1a0 v1] Transaction Processing Hints
                  Device specific mode supported
                  No steering table available
          Capabilities: [1b0 v1] Access Control Services
                  ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
                  ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
          Capabilities: [1d0 v1] Secondary PCI Express
                  LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                  LaneErrStat: 0
          Capabilities: [200 v1] Data Link Feature <?>
          Capabilities: [210 v1] Physical Layer 16.0 GT/s <?>
          Capabilities: [250 v1] Lane Margining at the Receiver <?>
          Kernel driver in use: ice
          Kernel modules: ice

  
  Issue Description:
  ==================
  # echo 1 > /sys/class/net/ens2f0/device/sriov_numvfs

  [ 5734.469217] ice 0000:16:00.0: Enabling 1 VFs
  [ 5734.574945] pci 0000:16:01.0: [8086:1889] type 00 class 0x020000
  [ 5734.574970] pci 0000:16:01.0: enabling Extended Tags
  [ 5734.575471] pci 0000:16:01.0: Adding to iommu group 443
  [ 5734.575718] ice 0000:16:00.0: Only 0 MSI-X interrupts available for 
SR-IOV. Not enough to support minimum of 2 MSI-X interrupts per VF for 1 VFs
  [ 5734.575815] ice 0000:16:00.0: Not enough resources for 1 VFs, try with 
fewer number of VFs
  [ 5734.576861] pci 0000:16:01.0: Removing from iommu group 443
  [ 5734.623292] iavf: Intel(R) Ethernet Adaptive Virtual Function Network 
Driver
  [ 5734.623297] Copyright (c) 2013 - 2018 Intel Corporation.
  [ 5735.598871] ice 0000:16:00.0: Failed to enable SR-IOV: -28

  
  Issue Found:
  ============
  1> After disable RDMA, the VF creation works fine; from kernel code, the MSIx 
are preserved by LAN and RDMA based on CPU cores, this will exhauste all 
available MSIx for larger core systems (some PF port will only have 512 MSIx in 
total), this doesn't make sense as the default value (at least make sure a few 
number VFs can be created successfully if NIC support it)
  2> When do the MSIx resource reallocation manually, still raise below error, 
this is some what a strange behavior, it's better to allow such actions by 
default from kernel:
      root@npx:~# devlink resource show pci/0000:16:00.0
      kernel answers: Operation not supported

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.15/+bug/2044810/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to