Public bug reported: [SRU Justification]
[Impact] While running Pytorch with some models that compute takes a long time there are some hangs that can be observed on Strix Halo. The hangs occur in the MES scheduler. A workaround has been developed for this situation in the MES scheduler and in the GPU driver. [Fix] The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12 products. The kernel change is: https://lore.kernel.org/amd-gfx/[email protected]/ The kernel change will only be enabled if new enough MES scheduler microcode is installed. [Test Case] Run pytorch, ensure that system doesn't hang. Run some games in steam, ensure system doesn't hang. [Where problems can go wrong] The workaround applies to all jobs sent to MES scheduler. It will be localized to GFX11 and GFX12 machines. If there was a problem from this change it could manifest as a hang on system. ** Affects: linux-oem-6.14 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oem-6.14 in Ubuntu. https://bugs.launchpad.net/bugs/2125201 Title: [noble] Fix system hang observed with comfy-ui Status in linux-oem-6.14 package in Ubuntu: New Bug description: [SRU Justification] [Impact] While running Pytorch with some models that compute takes a long time there are some hangs that can be observed on Strix Halo. The hangs occur in the MES scheduler. A workaround has been developed for this situation in the MES scheduler and in the GPU driver. [Fix] The MES scheduler change is in MES 0x7f for GFX11 products and 0x82 in GFX12 products. The kernel change is: https://lore.kernel.org/amd-gfx/[email protected]/ The kernel change will only be enabled if new enough MES scheduler microcode is installed. [Test Case] Run pytorch, ensure that system doesn't hang. Run some games in steam, ensure system doesn't hang. [Where problems can go wrong] The workaround applies to all jobs sent to MES scheduler. It will be localized to GFX11 and GFX12 machines. If there was a problem from this change it could manifest as a hang on system. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.14/+bug/2125201/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

