Public bug reported:

Symptom:
Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, 
all of our PCI-related network measurements on LPAR show massive throughput 
degradations (up to -72%). This shows for almost all workloads and numbers of 
connections, detereorating with the number of connections increasing. 
Especially drastic is the drop for a high number of parallel connections (50 
and 250) and for small and medium-size transactional workloads. However, also 
for streaming-type workloads the degradation is clearly visible (up to 48% 
degradation).

Problem:
With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode 
changed from lazy to strict, causing these massive degradations.
Behavior can also be changed with a kernel commandline parameter (iommu.strict) 
for easy verification.

The issue is known and was quickly fixed upstream in December 2023, after being 
present for little less than two weeks.
Upstream fix: 
https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a

Repro:
rr1c-200x1000-250 with rr1c-200x1000-250.xml:

<?xml version="1.0"?>
<profile name="TCP_RR">
        <group nprocs="250">
                <transaction iterations="1">
                        <flowop type="connect" options="remotehost=<remote IP> 
protocol=tcp  tcp_nodelay" />
                </transaction>
                <transaction duration="300">
                        <flowop type="write" options="size=200"/>
                        <flowop type="read" options="size=1000"/>
                </transaction>
                <transaction iterations="1">
                        <flowop type="disconnect" />
                </transaction>
        </group>
</profile>

0) Install uperf on both systems, client and server.
1) Start uperf at server: uperf -s
2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml

3) Switch from strict to lazy mode using kernel commandline parameter 
iommu.strict=0.
4) Repeat steps 1) and 2).

Example:
For the following example, we chose the workload named above 
(rr1c-200x1000-250):

iommu.strict=1 (strict): 233464.914 TPS
iommu.strict=0 (lazy): 835123.193 TPS

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Skipper Bug Screeners (skipper-screen-team)
         Status: New


** Tags: architecture-s39064 bugnameltc-207082 severity-high 
targetmilestone-inin---

** Tags added: architecture-s39064 bugnameltc-207082 severity-high
targetmilestone-inin---

** Changed in: ubuntu
     Assignee: (unassigned) => Skipper Bug Screeners (skipper-screen-team)

** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2071471

Title:
  [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
  throughput degradation for PCI-related network workloads

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2071471/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to