Public bug reported:

[Impact]
Some frameworks attempt to pre-allocate all available GPU memory, which can 
lead to a memory access fault even cause the GPU to hang.

They are solved by the updated microcode for GC 11.5.0.

File 
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_core.py", 
line 563, in _recv
    return recv(self.sock, bufsize)
           ^^^^^^^^^^^^^^^^^^^^^^^^
File 
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_socket.py", 
line 132, in recv
    raise WebSocketConnectionClosedException("Connection to remote host was 
lost.")
websocket._exceptions.WebSocketConnectionClosedException: Connection to remote 
host was lost.
HW Exception by GPU node-1 (Agent handle: 0x36616df0) reason :GPU Hang

[Test Plan]
Boot system and ensure no further exception errors or memory faults

[Where problems could occur]
These new AMDGPU FWs are only for gfx1150, which can prevent some unexpected 
behavior during intensive modeling tests with gfx1150 allocation.

[Other info]
ba2549fc amdgpu: Update GCN 4.0.5 microcode
61d6a5f9 amdgpu: Update SDMA 6.1.0 microcode
67ee1b80 amdgpu: Update GC 11.5.0 microcode

** Affects: linux-firmware (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: originates-from-2119539

** Tags added: originates-from-2119539

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2119627

Title:
  Update GC 11.5.0 microcode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2119627/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to