Performing SRU verification.

Kim Naru provided Alan Baghumian a bare metal system with high cpu
count.

The lscpu is in the realm of:

$ lscpu
Architecture:             x86_64
...
CPU(s):                   512
  On-line CPU(s) list:    0-511
...

They deployed a Jammy system, enabled -proposed and installed
5.15.0-125-generic:

$ uname -a
Linux test 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 
x86_64 x86_64 x86_64 GNU/Linux

$ sudo turbostat
turbostat version 21.05.04 - Len Brown <l...@kernel.org>
...
current_driver: acpi_idle
current_governor: menu
current_governor_ro: menu
cpu40: POLL: CPUIDLE CORE POLL IDLE
cpu40: C1: ACPI FFH MWAIT 0x0
cpu40: C2: ACPI IOPORT 0x814
cpu40: cpufreq driver: acpi-cpufreq
cpu40: cpufreq governor: schedutil
cpufreq boost: 1
cpu0: MSR_RAPL_PWR_UNIT: 0x000a1000 (1.000000 Watts, 0.000015 Joules, 0.000977 
sec.)
cpu128: MSR_RAPL_PWR_UNIT: 0x000a1000 (1.000000 Watts, 0.000015 Joules, 
0.000977 sec.)
Package Die     Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IPC     IRQ     
POLL    C1      C2      POLL%   C1%     C2%     CorWatt PkgWatt
-       -       -       -       0       0.03    1615    2742    0.51    15320   
0       36      14527   0.00    0.00    101.66  0.06    85.53
0       0       0       0       1       0.04    1820    2696    1.50    54      
0       0       50      0.00    0.00    99.96   0.00    42.88
...

The full turbostat output works as expected, showing values for all 512
cpu cores.

The kernel in -proposed fixes the issue, happy to mark verified for
jammy.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Mantic:
  Won't Fix
Status in linux source package in Noble:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069961

  [Impact]

  On large systems, e.g. with 512 cpus or more, turbostat fails to run
  due to exceeding the rlimit for number of files. 512 cpus requires
  1028 file descriptors, but the current limit is 999.

  $ lscpu
  ...
  CPU(s):                  512
    On-line CPU(s) list:   0-511
  ...

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There is no workaround, apart from maybe using powerstat instead.

  [Fix]

  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty
  for some time to come.

  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny <wyes.ka...@amd.com>
  Date:   Tue Oct 3 05:07:51 2023 +0000
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd

  This landed in 6.9-rc4, and requires a backport for minor context
  adjustment in the first hunk for jammy. Noble got fixed already
  through upstream stable.

  [Testcase]

  Deploy a bare metal system with 512 or more cpus.

  Install linux-tools:

  $ sudo apt install linux-tools-$(uname -r)

  Run turbostat:

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There are test kernels available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test

  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.

  [Where problems can occur]

  We are simply increasing the rlimit for file descriptors that
  turbostat can open. This should have no impact on any existing
  systems.

  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things
  are fixed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to