More data points: RHEL9 (kernel 5.14.0) w/ HT: 22s RHEL9 (kernel 5.14.0) w/o HT: 22s Slackware (kernel 5.15.161) w/ HT: 155s Slackware (kernel 5.15.161) w/o HT: 22s ubuntu20 (kernel 5.4.0-196-generic) w/ HT: 14s ubuntu20 (UPGRADED kernel 5.11.0-22-generic) w/ HT: 19s ubuntu20 (UPGRADED kernel 5.13.0-21-generic) w/ HT: 19s ubuntu20 (UPGRADED kernel 5.15.0-33-generic) w/ HT: 19s ubuntu20 (UPGRADED kernel 5.15.0-43-generic) w/ HT: 19s ubuntu20 (UPGRADED kernel 5.15.0-46-generic) w/ HT: 84s ubuntu20 (UPGRADED kernel 5.15.0-50-generic) w/ HT: 84s ubuntu20 (UPGRADED kernel 5.15.0-75-generic) w/ HT: 84s ubuntu20 (UPGRADED kernel 5.15.0-100-generic) w/ HT: 84s ubuntu20 (UPGRADED kernel 5.15.0-122-generic) w/ HT: 84s
This confirms that changing the kernel version while leaving other packages alone introduces the regression, so I have assigned this bug to the linux package. The various kernels were installed using apt install. At this point, it looks like I may need to look into how to reproduce these kernel builds so I can start trying to build intermediate commits from source. ** Package changed: ubuntu => linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2083077 Title: python3 counting 6x slowdown with ubuntu22 on cisco ucs hardware with hyperthreading Status in linux package in Ubuntu: New Bug description: I suspect this is a kernel bug. With ubuntu <= 21, I find that this runs in about 13 seconds: python3 -c "import timeit; print(timeit.Timer('for _ in range(0,1000): pass').timeit())" With ubuntu >= 22, I find that it runs in about 83 seconds. The problem seems to be specific to Cisco UCS hardware and can be mostly mitigated by disabling hyperthreading. I also tried counting to a million a thousand times instead of counting to 1000 a million times (this is how many times timeit runs the experiment), just in case the time-measuring was the slow part, but it made no difference. Even just a straight up loop without using timeit shows about the same difference. Originally, I encountered this when upgrading from 18 to 24. We went back and isolated the problem to something that changed between 21 and 22. The version I actually care about is 24. The only Cisco UCS systems we have are a bunch of Cisco UCS C220 M5SX rack servers and a bunch of Cisco UCS B200 M5 blades. All of them show the regression. I can confirm that on a variety of similarly- specced supermicro systems, the regression does not occur. The problem can be easily reproduced by booting off https://releases.ubuntu.com/24.04.1/ubuntu-24.04.1-live-server- amd64.iso (or various other versions) and dropping into a shell. The installer kernel behaves the same as the installed kernel across the various versions. So it should be possible for anyone with this hardware to reproduce the issue by using the installer shells. You may wish to use an old python3 from a version-pinned docker image to get an apples-to-apples comparison. If I run the experiment inside ubuntu18 containers on ubuntu21 and ubuntu22 I can see that I still get the dramatically different runtimes. i.e., the kernel version and not the userland or python version is what seems to matter. We have tried mitigations=off with no effect. We have tried reverting various kernel scheduler configuration changes back to their ubuntu21 settings with no effect. We have tried disabling hyperthreading in the BIOS. This had an enormous effect. It reduces runtime from 83 seconds to 17 seconds. 17 is still 30% slower than 13, but it is obviously way better than 83. So just to recap: 13s: ubuntu21 with hyperthreading on 83s: ubuntu22 with hyperthreading on 17s: ubuntu22 with hyperthreading off This machine has 2 sockets with 20 physical cores each for a total of 80 logical cores once we account for hyperthreading. Ideally I would prefer not to be forced to disable hyperthreading. Even if that is not possible, I am interested in avoiding the remaining 30% slowdown. sysbench --test=cpu and sysbench --test=memory also both exhibit a slowdown, but it is more like a 30% slowdown instead of 800%, even with hyperthreading turned on. I have used perf to profile python and found the time was spread out-- did not see any particular smoking gun. The python process makes < 300 syscalls over its entire lifetime and virtually no context switches. I tried running it with realtime priority with affinity for a single core, which seemed to make little difference. The python process uses 100% of a cpu as it runs. Any ideas? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2083077/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp