Public bug reported: [Impact] While running some tasks stability issues are observed on GC 11.5.1 that have been root caused.
They are solved by the updated microcode for GC 11.5.1. amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:c4:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1006 amdgpu 0000:c4:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 4 amdgpu 0000:c4:00.0: amdgpu: GPU reset begin! amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 2 amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 1 amdgpu 0000:c4:00.0: amdgpu: Failed to evict queue 0 amdgpu 0000:c4:00.0: amdgpu: Dumping IP State amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 amdgpu 0000:c4:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B53 amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x1 amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x1 amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5 amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x1 amdgpu 0000:c4:00.0: amdgpu: RW: 0x1 amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=SUSPEND [drm:amdgpu_mes_suspend [amdgpu]] *ERROR* failed to suspend all gangs [Test Plan] Boot system and ensure no further exception errors or memory faults [Where problems could occur] These new AMDGPU FWs are only for gfx1151, which can prevent some unexpected behavior during intensive modeling tests with gfx1151 allocation. [Other info] e2c1b15 amdgpu: Update GC 11.5.1 microcode ** Affects: linux-firmware (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2117463 Title: Update GC 11.5.1 microcode To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2117463/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs