** Description changed: This is a public version of https://bugs.launchpad.net/bugs/2049792 Backport: [SRF] performance: hwmon: (coretemp) Fix core count limitation (merged upstream in 6.9) to jammy - [Description] - coretemp driver supports at most 128 cores per package. Cores higher than 128 will lose their core temperature information. - Some SRF SKUs have more than 128 cores per package and triggers the issue. + [Impact] + + In linux 6.8 the coretemp driver supports at most 128 cores per package. + Cores higher than 128 will lose their core temperature information. + + There is an upstream patch set that allows to support more than 128 + cores per package, it's applied to linux-next, then to Noble. + + We should apply the patch set to the Jammy 5.15 kernel, so that we can + properly support systems with a large amount of cores per package. + + [Test case] + + Read temperature info from /sys/class/hwmon on a system with > 128 cores + per package (that means we don't have a proper test case to verify the + fix at the moment). [Fix] + A series of patch is part of this improvement: + 1a793caf6f69 hwmon: (coretemp) Use dynamic allocated memory for core temp_data 18b24a5f9ca3 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data 326241f71f3d hwmon: (coretemp) Split package temp_data and core temp_data b0b01414a261 hwmon: (coretemp) Abstract core_temp helpers 87eb801925a0 hwmon: (coretemp) Remove redundant pdata->cpu_map[] 18d8f5583388 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute 25f8e01baa05 hwmon: (coretemp) Remove unnecessary dependency of array index c8c2074020a8 hwmon: (coretemp) Introduce enum for attr index + And some patch are required to make the backporting clean: + 34cf8c657cf03 hwmon: (coretemp) Enlarge per package core count limit fdaf0c8629d45 hwmon: (coretemp) Fix bogus core_id to attr name mapping 4e440abc89458 hwmon: (coretemp) Fix out-of-bounds memory access a2930f6dc90f0 hwmon: (coretemp) Delete an obsolete comment 6c2b659913ad9 hwmon: (coretemp) Delete tjmax debug message 0f8b916bc5b5d hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs fae30e3c203e0 hwmon: (coretemp) Add support for dynamic ttarget c0c67f8761cec hwmon: (coretemp) Add support for dynamic tjmax 2bc0e6d07ee50 hwmon: (coretemp) rearrange tjmax handing code 5c0e64dde80ff hwmon: (coretemp) Remove obsolete temp_data->valid - Only 5c0e64dde80ff has to be modified as it's delete a variable which changed type + Only 5c0e64dde80ff has to be modified as it's deleting a variable which changed type because of a refactoring. - [Test] - Verify on specific hardware if we can read temperature accordingly. + There is a number of commits, but they are only changing one file. + + [Regression potential] + + We may experience hwmon-related regressions, either systems reading + incorrect temperature information or even bugs/crashes when accessing + data from /sys/class/hwmon.
** Changed in: linux (Ubuntu Jammy) Status: New => In Progress ** Changed in: linux (Ubuntu Jammy) Assignee: (unassigned) => Thibf (thibf) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2058668 Title: [SRF] performance: hwmon: (coretemp) Fix core count limitation Status in linux package in Ubuntu: New Status in linux source package in Jammy: In Progress Bug description: This is a public version of https://bugs.launchpad.net/bugs/2049792 Backport: [SRF] performance: hwmon: (coretemp) Fix core count limitation (merged upstream in 6.9) to jammy [Impact] In linux 6.8 the coretemp driver supports at most 128 cores per package. Cores higher than 128 will lose their core temperature information. There is an upstream patch set that allows to support more than 128 cores per package, it's applied to linux-next, then to Noble. We should apply the patch set to the Jammy 5.15 kernel, so that we can properly support systems with a large amount of cores per package. [Test case] Read temperature info from /sys/class/hwmon on a system with > 128 cores per package (that means we don't have a proper test case to verify the fix at the moment). [Fix] A series of patch is part of this improvement: 1a793caf6f69 hwmon: (coretemp) Use dynamic allocated memory for core temp_data 18b24a5f9ca3 hwmon: (coretemp) Remove redundant temp_data->is_pkg_data 326241f71f3d hwmon: (coretemp) Split package temp_data and core temp_data b0b01414a261 hwmon: (coretemp) Abstract core_temp helpers 87eb801925a0 hwmon: (coretemp) Remove redundant pdata->cpu_map[] 18d8f5583388 hwmon: (coretemp) Replace sensor_device_attribute with device_attribute 25f8e01baa05 hwmon: (coretemp) Remove unnecessary dependency of array index c8c2074020a8 hwmon: (coretemp) Introduce enum for attr index And some patch are required to make the backporting clean: 34cf8c657cf03 hwmon: (coretemp) Enlarge per package core count limit fdaf0c8629d45 hwmon: (coretemp) Fix bogus core_id to attr name mapping 4e440abc89458 hwmon: (coretemp) Fix out-of-bounds memory access a2930f6dc90f0 hwmon: (coretemp) Delete an obsolete comment 6c2b659913ad9 hwmon: (coretemp) Delete tjmax debug message 0f8b916bc5b5d hwmon: (coretemp) avoid RDMSR interrupts to isolated CPUs fae30e3c203e0 hwmon: (coretemp) Add support for dynamic ttarget c0c67f8761cec hwmon: (coretemp) Add support for dynamic tjmax 2bc0e6d07ee50 hwmon: (coretemp) rearrange tjmax handing code 5c0e64dde80ff hwmon: (coretemp) Remove obsolete temp_data->valid Only 5c0e64dde80ff has to be modified as it's deleting a variable which changed type because of a refactoring. There is a number of commits, but they are only changing one file. [Regression potential] We may experience hwmon-related regressions, either systems reading incorrect temperature information or even bugs/crashes when accessing data from /sys/class/hwmon. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2058668/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp