** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Questing)
Importance: Undecided
Status: New
** Also affects: linux-oem-6.14 (Ubuntu Questing)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New
** Also affects: linux-oem-6.14 (Ubuntu Noble)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Noble)
Status: New => Won't Fix
** Changed in: linux-oem-6.14 (Ubuntu Questing)
Status: New => Invalid
** Description changed:
[SRU Justification]
[Impact]
While running Pytorch with some models that compute takes a long time
there are some hangs that can be observed on Strix Halo.
The hangs occur in the MES scheduler. A workaround has been developed
for this situation in the MES scheduler and in the GPU driver.
[Fix]
The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12
products.
- The kernel change is:
+ * The kernel change is:
https://lore.kernel.org/amd-gfx/[email protected]/
+ * For OEM 6.14 this also has a dependency on
https://git.kernel.org/torvalds/c/15d8c92f107c1 to cleanly backport.
The kernel change will only be enabled if new enough MES scheduler
microcode is installed.
[Test Case]
Run pytorch, ensure that system doesn't hang.
Run some games in steam, ensure system doesn't hang.
[Where problems can go wrong]
The workaround applies to all jobs sent to MES scheduler. It will be
localized to GFX11 and GFX12 machines.
If there was a problem from this change it could manifest as a hang on
system.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2125201
Title:
[noble] Fix system hang observed with comfy-ui
Status in linux package in Ubuntu:
New
Status in linux-oem-6.14 package in Ubuntu:
Invalid
Status in linux source package in Noble:
Won't Fix
Status in linux-oem-6.14 source package in Noble:
New
Status in linux source package in Questing:
New
Status in linux-oem-6.14 source package in Questing:
Invalid
Bug description:
[SRU Justification]
[Impact]
While running Pytorch with some models that compute takes a long time
there are some hangs that can be observed on Strix Halo.
The hangs occur in the MES scheduler. A workaround has been developed
for this situation in the MES scheduler and in the GPU driver.
[Fix]
The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12
products.
* The kernel change is:
https://lore.kernel.org/amd-gfx/[email protected]/
* For OEM 6.14 this also has a dependency on
https://git.kernel.org/torvalds/c/15d8c92f107c1 to cleanly backport.
The kernel change will only be enabled if new enough MES scheduler
microcode is installed.
[Test Case]
Run pytorch, ensure that system doesn't hang.
Run some games in steam, ensure system doesn't hang.
[Where problems can go wrong]
The workaround applies to all jobs sent to MES scheduler. It will be
localized to GFX11 and GFX12 machines.
If there was a problem from this change it could manifest as a hang on
system.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2125201/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp