Here are the ROCm fixes you need:

https://github.com/ROCm/rocm-libraries/pull/699
https://github.com/ROCm/rocm-libraries/pull/696

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2117463

Title:
  Update GC 11.5.1 microcode

Status in linux-firmware package in Ubuntu:
  Fix Released
Status in linux-firmware source package in Noble:
  In Progress
Status in linux-firmware source package in Plucky:
  In Progress
Status in linux-firmware source package in Questing:
  Fix Released

Bug description:
  [Impact]
  While running some tasks stability issues are observed on GC 11.5.1 that have 
been root caused.

  They are solved by the updated microcode for GC 11.5.1.

  amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
  amdgpu 0000:c4:00.0: amdgpu: failed to remove hardware queue from MES, 
doorbell=0x1006
  amdgpu 0000:c4:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU 
reset
  amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 4
  amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
  amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 2
  amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 1
  amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 0
  amdgpu 0000:c4:00.0: amdgpu: Dumping IP State
  amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed
  amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 
pasid:0)
  amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000000000 
from client 10
  amdgpu 0000:c4:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B53
  amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
  amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x1
  amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x1
  amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5
  amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x1
  amdgpu 0000:c4:00.0: amdgpu: RW: 0x1
  amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 
pasid:0)
  amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000000000 
from client 10
  amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=SUSPEND
  [drm:amdgpu_mes_suspend [amdgpu]] *ERROR* failed to suspend all gangs

  [Test Plan]
  Boot system and ensure no further exception errors or memory faults

  [Where problems could occur]
  These new AMDGPU FWs are only for gfx1151, which can prevent some unexpected 
behavior during intensive modeling tests with gfx1151 allocation.  

  [Other info]
  e2c1b15 amdgpu: Update GC 11.5.1 microcode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2117463/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to