Public bug reported:

[SRU Justification]

[Impact]

While running Pytorch with some models that compute takes a long time
there are some hangs that can be observed on Strix Halo.

The hangs occur in the MES scheduler. A workaround has been developed
for this situation in the MES scheduler and in the GPU driver.

[Fix]
The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12 
products.

The kernel change is:
https://lore.kernel.org/amd-gfx/[email protected]/

The kernel change will only be enabled if new enough MES scheduler
microcode is installed.

[Test Case]
Run pytorch, ensure that system doesn't hang.
Run some games in steam, ensure system doesn't hang.

[Where problems can go wrong]
The workaround applies to all jobs sent to MES scheduler.  It will be localized 
to GFX11 and GFX12 machines.

If there was a problem from this change it could manifest as a hang on
system.

** Affects: linux-oem-6.14 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oem-6.14 in Ubuntu.
https://bugs.launchpad.net/bugs/2125201

Title:
  [noble] Fix system hang observed with comfy-ui

Status in linux-oem-6.14 package in Ubuntu:
  New

Bug description:
  [SRU Justification]

  [Impact]

  While running Pytorch with some models that compute takes a long time
  there are some hangs that can be observed on Strix Halo.

  The hangs occur in the MES scheduler. A workaround has been developed
  for this situation in the MES scheduler and in the GPU driver.

  [Fix]
  The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12 
products.

  The kernel change is:
  https://lore.kernel.org/amd-gfx/[email protected]/

  The kernel change will only be enabled if new enough MES scheduler
  microcode is installed.

  [Test Case]
  Run pytorch, ensure that system doesn't hang.
  Run some games in steam, ensure system doesn't hang.

  [Where problems can go wrong]
  The workaround applies to all jobs sent to MES scheduler.  It will be 
localized to GFX11 and GFX12 machines.

  If there was a problem from this change it could manifest as a hang on
  system.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.14/+bug/2125201/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to