This bug was fixed in the package linux - 6.11.0-7.7

---------------
linux (6.11.0-7.7) oracular; urgency=medium

  * oracular/linux: 6.11.0-7.7 -proposed tracker (LP: #2079949)

  * update apparmor and LSM stacking patch set (LP: #2028253)
    - SAUCE: apparmor4.0.0 [1/99]: LSM: Infrastructure management of the sock
      security
    - SAUCE: apparmor4.0.0 [2/99]: LSM: Add the lsmblob data structure.
    - SAUCE: apparmor4.0.0 [3/99]: LSM: Use lsmblob in security_audit_rule_match
    - SAUCE: apparmor4.0.0 [4/99]: LSM: Call only one hook for audit rules
    - SAUCE: apparmor4.0.0 [5/99]: LSM: Add lsmblob_to_secctx hook
    - SAUCE: apparmor4.0.0 [6/99]: Audit: maintain an lsmblob in audit_context
    - SAUCE: apparmor4.0.0 [7/99]: LSM: Use lsmblob in security_ipc_getsecid
    - SAUCE: apparmor4.0.0 [8/99]: Audit: Update shutdown LSM data
    - SAUCE: apparmor4.0.0 [9/99]: LSM: Use lsmblob in security_current_getsecid
    - SAUCE: apparmor4.0.0 [10/99]: LSM: Use lsmblob in security_inode_getsecid
    - SAUCE: apparmor4.0.0 [11/99]: Audit: use an lsmblob in audit_names
    - SAUCE: apparmor4.0.0 [12/99]: LSM: Create new security_cred_getlsmblob LSM
      hook
    - SAUCE: apparmor4.0.0 [13/99]: Audit: Change context data from secid to
      lsmblob
    - SAUCE: apparmor4.0.0 [14/99]: Netlabel: Use lsmblob for audit data
    - SAUCE: apparmor4.0.0 [15/99]: LSM: Ensure the correct LSM context releaser
    - SAUCE: apparmor4.0.0 [16/99]: LSM: Use lsmcontext in
      security_secid_to_secctx
    - SAUCE: apparmor4.0.0 [17/99]: LSM: Use lsmcontext in
      security_lsmblob_to_secctx
    - SAUCE: apparmor4.0.0 [18/99]: LSM: Use lsmcontext in
      security_inode_getsecctx
    - SAUCE: apparmor4.0.0 [19/99]: LSM: lsmcontext in
      security_dentry_init_security
    - SAUCE: apparmor4.0.0 [20/99]: LSM: security_lsmblob_to_secctx module
      selection
    - SAUCE: apparmor4.0.0 [21/99]: Audit: Create audit_stamp structure
    - SAUCE: apparmor4.0.0 [22/99]: Audit: Allow multiple records in an
      audit_buffer
    - SAUCE: apparmor4.0.0 [23/99]: Audit: Add record for multiple task security
      contexts
    - SAUCE: apparmor4.0.0 [24/99]: audit: multiple subject lsm values for
      netlabel
    - SAUCE: apparmor4.0.0 [25/99]: Audit: Add record for multiple object 
contexts
    - SAUCE: apparmor4.0.0 [26/99]: LSM: Remove unused lsmcontext_init()
    - SAUCE: apparmor4.0.0 [27/99]: LSM: Improve logic in security_getprocattr
    - SAUCE: apparmor4.0.0 [28/99]: LSM: secctx provider check on release
    - SAUCE: apparmor4.0.0 [29/99]: LSM: Single calls in socket_getpeersec hooks
    - SAUCE: apparmor4.0.0 [30/99]: LSM: Exclusive secmark usage
    - SAUCE: apparmor4.0.0 [31/99]: LSM: Identify which LSM handles the context
      string
    - SAUCE: apparmor4.0.0 [32/99]: AppArmor: Remove the exclusive flag
    - SAUCE: apparmor4.0.0 [33/99]: LSM: Add mount opts blob size tracking
    - SAUCE: apparmor4.0.0 [34/99]: LSM: allocate mnt_opts blobs instead of 
module
      specific data
    - SAUCE: apparmor4.0.0 [35/99]: LSM: Infrastructure management of the key
      security blob
    - SAUCE: apparmor4.0.0 [36/99]: LSM: Infrastructure management of the 
mnt_opts
      security blob
    - SAUCE: apparmor4.0.0 [37/99]: LSM: Remove lsmblob scaffolding
    - SAUCE: apparmor4.0.0 [38/99]: LSM: Allow reservation of netlabel
    - SAUCE: apparmor4.0.0 [39/99]: LSM: restrict security_cred_getsecid() to a
      single LSM
    - SAUCE: apparmor4.0.0 [40/99]: Smack: Remove LSM_FLAG_EXCLUSIVE
    - SAUCE: apparmor4.0.0 [41/99]: LSM stacking v39: UBUNTU: SAUCE: 
apparmor4.0.0
      [41/99]: add/use fns to print hash string hex value
    - SAUCE: apparmor4.0.0 [42/99]: patch to provide compatibility with v2.x net
      rules
    - SAUCE: apparmor4.0.0 [43/99]: add unpriviled user ns mediation
    - SAUCE: apparmor4.0.0 [44/99]: Add sysctls for additional controls of 
unpriv
      userns restrictions
    - SAUCE: apparmor4.0.0 [45/99]: af_unix mediation
    - SAUCE: apparmor4.0.0 [46/99]: Add fine grained mediation of posix mqueues
    - SAUCE: apparmor4.0.0 [47/99] fixup inode_set_attr
    - SAUCE: apparmor4.0.0 [48/99]: setup slab cache for audit data
    - SAUCE: apparmor4.0.0 [49/99]: Improve debug print infrastructure
    - SAUCE: apparmor4.0.0 [50/99]: add the ability for profiles to have a
      learning cache
    - SAUCE: apparmor4.0.0 [51/99]: enable userspace upcall for mediation
    - SAUCE: apparmor4.0.0 [52/99]: prompt - lock down prompt interface
    - SAUCE: apparmor4.0.0 [53/99]: prompt - allow controlling of caching of a
      prompt response
    - SAUCE: apparmor4.0.0 [54/99]: prompt - add refcount to audit_node in prep 
or
      reuse and delete
    - SAUCE: apparmor4.0.0 [55/99]: prompt - refactor to moving caching to
      uresponse
    - SAUCE: apparmor4.0.0 [56/99]: prompt - Improve debug statements
    - SAUCE: apparmor4.0.0 [57/99]: prompt - fix caching
    - SAUCE: apparmor4.0.0 [58/99]: prompt - rework build to use append fn, to
      simplify adding strings
    - SAUCE: apparmor4.0.0 [59/99]: prompt - refcount notifications
    - SAUCE: apparmor4.0.0 [60/99]: prompt - add the ability to reply with a
      profile name
    - SAUCE: apparmor4.0.0 [61/99]: prompt - fix notification cache when 
updating
    - SAUCE: apparmor4.0.0 [62/99]: prompt - add tailglob on name for cache
      support
    - SAUCE: apparmor4.0.0 [63/99]: prompt - allow profiles to set prompts as
      interruptible
    - SAUCE: apparmor4.0.0 [64/93] v6.8 prompt:fixup interruptible
    - SAUCE: apparmor4.0.0 [65/99]: prompt - add support for advanced filtering 
of
      notifications
    - SAUCE: apparmor4.0.0 [66/99]: userns - add the ability to reference a 
global
      variable for a feature value
    - SAUCE: apparmor4.0.0 [67/99]: userns - make it so special unconfined
      profiles can mediate user namespaces
    - SAUCE: apparmor4.0.0 [68/99]: add io_uring mediation
    - SAUCE: apparmor4.0.0 [69/99]: apparmor: fix oops when racing to retrieve
      notification
    - SAUCE: apparmor4.0.0 [70/99]: apparmor: fix notification header size
    - SAUCE: apparmor4.0.0 [71/99]: apparmor: fix request field from a prompt
      reply that denies all access
    - SAUCE: apparmor4.0.0 [72/99]: apparmor: open userns related sysctl so lxc
      can check if restriction are in place
    - SAUCE: apparmor4.0.0 [73/99]: apparmor: cleanup attachment perm lookup to
      use lookup_perms()
    - SAUCE: apparmor4.0.0 [74/99]: apparmor: remove redundant unconfined check.
    - SAUCE: apparmor4.0.0 [75/99]: apparmor: switch signal mediation to using
      RULE_MEDIATES
    - SAUCE: apparmor4.0.0 [76/99]: apparmor: ensure labels with more than one
      entry have correct flags
    - SAUCE: apparmor4.0.0 [77/99]: apparmor: remove explicit restriction that
      unconfined cannot use change_hat
    - SAUCE: apparmor4.0.0 [78/99]: apparmor: cleanup: refactor file_perm() to
      provide semantics of some checks
    - SAUCE: apparmor4.0.0 [79/99]: apparmor: carry mediation check on label
    - SAUCE: apparmor4.0.0 [80/99]: apparmor: convert easy uses of unconfined() 
to
      label_mediates()
    - SAUCE: apparmor4.0.0 [81/99]: apparmor: add additional flags to extended
      permission.
    - SAUCE: apparmor4.0.0 [82/99]: apparmor: add support for profiles to define
      the kill signal
    - SAUCE: apparmor4.0.0 [83/99]: apparmor: fix x_table_lookup when stacking 
is
      not the first entry
    - SAUCE: apparmor4.0.0 [84/99]: apparmor: allow profile to be transitioned
      when a user ns is created
    - SAUCE: apparmor4.0.0 [85/99]: apparmor: add ability to mediate caps with
      policy state machine
    - SAUCE: apparmor4.0.0 [86/99]: fixup notify
    - SAUCE: apparmor4.0.0 [87/99]: apparmor: add fine grained ipv4/ipv6 
mediation
    - SAUCE: apparmor4.0.0 [88/99]: apparmor: disable tailglob responses for now
    - SAUCE: apparmor4.0.0 [89/99]: apparmor: Fix notify build warnings
    - SAUCE: apparmor4.0.0 [90/99]: fix reserved mem for when we save ipv6
      addresses
    - SAUCE: apparmor4.0.0 [91/99]: fix address mapping for recvfrom
    - SAUCE: apparmor4.0.0 [92/99]: apparmor: add support for 2^24 states to the
      dfa state machine.
    - SAUCE: apparmor4.0.0 [93/99]: apparmor: advertise to userspace support of
      user upcall for file rules.
    - SAUCE: apparmor4.0.0 [94/99]: apparmor: allocate xmatch for nullpdf inside
      aa_alloc_null
    - SAUCE: apparmor4.0.0 [95/99]: apparmor: properly handle cx/px lookup 
failure
      for complain
    - SAUCE: apparmor4.0.0 [96/99]: apparmor: fix prompt failing during large 
down
      loads
    - SAUCE: apparmor4.0.0 [97/99]: apparmor: fix allow field in notification
    - SAUCE: apparmor4.0.0 [98/99]: fix build error with !CONFIG_SECURITY
    - SAUCE: apparmor4.0.0 [99/99]: fix build error with in nfs4xdr

  * Intel Lunar Lake / Battlemage enablement (LP: #2076209)
    - drm/xe/lnl: Drop force_probe requirement
    - drm/xe: Support 'nomodeset' kernel command-line option
    - drm/i915/display: Plane capability for 64k phys alignment
    - drm/xe: Align all VRAM scanout buffers to 64k physical pages when needed.
    - drm/xe: Use separate rpm lockdep map for non-d3cold-capable devices
    - drm/xe: Fix NPD in ggtt_node_remove()
    - drm/xe/bmg: Drop force_probe requirement
    - drm/xe/gsc: Fix FW status if the firmware is already loaded
    - drm/xe/gsc: Track the platform in the compatibility version
    - drm/xe/gsc: Wedge the device if the GSCCS reset fails
    - drm/i915/bios: Update new entries in VBT BDB block definitions
    - drm/xe/hwmon: Treat hwmon as a per-device concept
    - drm/xe: s/xe_tile_migrate_engine/xe_tile_migrate_exec_queue
    - drm/xe: Add xe_vm_pgtable_update_op to xe_vma_ops
    - drm/xe: Add xe_exec_queue_last_fence_test_dep
    - drm/xe: Add timeout to preempt fences
    - drm/xe: Convert multiple bind ops into single job
    - drm/xe: Update VM trace events
    - drm/xe: Update PT layer with better error handling
    - drm/xe: Add VM bind IOCTL error injection
    - dma-buf: Split out dma fence array create into alloc and arm functions
    - drm/xe: Invalidate media_gt TLBs in PT code
    - drm/i915/display: Fix BMG CCS modifiers
    - drm/xe: Use xe_pm_runtime_get in xe_bo_move() if reclaim-safe.
    - drm/xe: Remove extra dma_fence_put on xe_sync_entry_add_deps failure

  * [24.10 FEAT] [KRN1911] Vertical CPU Polarization Support Stage 2
    (LP: #2072760)
    - s390/wti: Introduce infrastructure for warning track interrupt
    - s390/wti: Prepare graceful CPU pre-emption on wti reception
    - s390/wti: Add wti accounting for missed grace periods
    - s390/wti: Add debugfs file to display missed grace periods per cpu
    - s390/topology: Add sysctl handler for polarization
    - s390/topology: Add config option to switch to vertical during boot
    - s390/smp: Add cpu capacities
    - s390/hiperdispatch: Introduce hiperdispatch
    - s390/hiperdispatch: Add steal time averaging
    - s390/hiperdispatch: Add trace events
    - s390/hiperdispatch: Add hiperdispatch sysctl interface
    - s390/hiperdispatch: Add hiperdispatch debug attributes
    - s390/hiperdispatch: Add hiperdispatch debug counters
    - [Config] Initial set of new options HIPERDISPATCH_ON and
      SCHED_TOPOLOGY_VERTICAL to yes for s390x

  * Remove non-LPAE kernel flavor (LP: #2025265)
    - [Packaging] Drop control.d/vars.generic-lpae

  * generate and ship vmlinux.h to allow packages to build BPF CO-RE
    (LP: #2050083)
    - [Packaging] Don't call dh_all on linux-bpf-dev unless on master kernel

  * Miscellaneous Ubuntu changes
    - [Config] updateconfigs following v6.11-rc7 rebase

 -- Timo Aaltonen <timo.aalto...@canonical.com>  Mon, 09 Sep 2024
13:38:09 +0300

** Changed in: linux (Ubuntu Oracular)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2071471

Title:
  [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
  throughput degradation for PCI-related network workloads

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Released
Status in linux source package in Oracular:
  Fix Released

Bug description:
  SRU Justification:

  [Impact]

   * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
     (upstream with since kernel v6.7-rc1) there was a move (on s390x only)
     to a different dma-iommu implementation.

   * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
     (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config
     option should now be set to 'yes' by default for s390x.

   * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY
     are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be
     set to "no" by default, which was upstream done by b2b97a62f055
     "Revert "s390: update defconfigs"".

   * These changes are all upstream, but were not picked up by the Ubuntu
     kernel config.

   * And not having these config options set properly is causing significant
     PCI-related network throughput degradation (up to -72%).

   * This shows for almost all workloads and numbers of connections,
     deteriorating with the number of connections increasing.

   * Especially drastic is the drop for a high number of parallel connections
     (50 and 250) and for small and medium-size transactional workloads.
     However, also for streaming-type workloads the degradation is clearly
     visible (up to 48% degradation).

  [Fix]

   * The (upstream accepted) fix is to set
     IOMMU_DEFAULT_DMA_STRICT=no
     and
     IOMMU_DEFAULT_DMA_LAZY=y
     (which is needed for the changed DAM IOMMU implementation since v6.7).

  [Test Case]

   * Setup two Ubuntu Server 24.04 systems (with kernel 6.8)
     (one acting as server and as client)
     that have (PCIe attached) RoCE Express devices attached
     and that are connected to each other.

   * Verify if the the iommu_group type of the used PCI device is DMA-FQ:
     cat /sys/bus/pci/devices/<device>\:00\:00.0/iommu_group/type
     DMA-FQ

   * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:
     <?xml version="1.0"?>
     <profile name="TCP_RR">
             <group nprocs="250">
                     <transaction iterations="1">
                             <flowop type="connect" options="remotehost=<remote 
IP> protocol=tcp tcp_nodelay" />
                     </transaction>
                     <transaction duration="300">
                             <flowop type="write" options="size=200"/>
                             <flowop type="read" options="size=1000"/>
                     </transaction>
                     <transaction iterations="1">
                             <flowop type="disconnect" />
                     </transaction>
             </group>
     </profile>

   * Install uperf on both systems, client and server.

   * Start uperf at server: uperf -s

   * Start uperf at client: uperf -vai 5 -m uperf-profile.xml

   * Switch from strict to lazy mode
     either using the new kernel (or the test build below)
     or using kernel cmd-line parameter iommu.strict=0.

   * Restart uperf on server and client, like before.

   * Verification will be performed by IBM.

  [Regression Potential]

   * The is a certain regression potential, since the behavior with
     the two modified kernel config options will change significantly.

   * This may solve the (network) throughput issue with PCI devices,
     but may also come with side-effects on other PCIe based devices
     (the old compression adapters or the new NVMe carrier cards).

  [Other]

   * CCW devices are not affected.

   * This is s390x-specific only, hence will not affect any other
  architecture.

  __________

  Symptom:
  Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 
22.04, all of our PCI-related network measurements on LPAR show massive 
throughput degradations (up to -72%). This shows for almost all workloads and 
numbers of connections, detereorating with the number of connections 
increasing. Especially drastic is the drop for a high number of parallel 
connections (50 and 250) and for small and medium-size transactional workloads. 
However, also for streaming-type workloads the degradation is clearly visible 
(up to 48% degradation).

  Problem:
  With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode 
changed from lazy to strict, causing these massive degradations.
  Behavior can also be changed with a kernel commandline parameter 
(iommu.strict) for easy verification.

  The issue is known and was quickly fixed upstream in December 2023, after 
being present for little less than two weeks.
  Upstream fix: 
https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a

  Repro:
  rr1c-200x1000-250 with rr1c-200x1000-250.xml:

  <?xml version="1.0"?>
  <profile name="TCP_RR">
          <group nprocs="250">
                  <transaction iterations="1">
                          <flowop type="connect" options="remotehost=<remote 
IP> protocol=tcp  tcp_nodelay" />
                  </transaction>
                  <transaction duration="300">
                          <flowop type="write" options="size=200"/>
                          <flowop type="read" options="size=1000"/>
                  </transaction>
                  <transaction iterations="1">
                          <flowop type="disconnect" />
                  </transaction>
          </group>
  </profile>

  0) Install uperf on both systems, client and server.
  1) Start uperf at server: uperf -s
  2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml

  3) Switch from strict to lazy mode using kernel commandline parameter 
iommu.strict=0.
  4) Repeat steps 1) and 2).

  Example:
  For the following example, we chose the workload named above 
(rr1c-200x1000-250):

  iommu.strict=1 (strict): 233464.914 TPS
  iommu.strict=0 (lazy): 835123.193 TPS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2071471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to