Public bug reported:
[Impact]
Some frameworks attempt to pre-allocate all available GPU memory, which can
lead to a memory access fault even cause the GPU to hang.
They are solved by the updated microcode for GC 11.5.0.
File
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_core.py",
line 563, in _recv
return recv(self.sock, bufsize)
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_socket.py",
line 132, in recv
raise WebSocketConnectionClosedException("Connection to remote host was
lost.")
websocket._exceptions.WebSocketConnectionClosedException: Connection to remote
host was lost.
HW Exception by GPU node-1 (Agent handle: 0x36616df0) reason :GPU Hang
[Test Plan]
Boot system and ensure no further exception errors or memory faults
[Where problems could occur]
These new AMDGPU FWs are only for gfx1150, which can prevent some unexpected
behavior during intensive modeling tests with gfx1150 allocation.
[Other info]
ba2549fc amdgpu: Update GCN 4.0.5 microcode
61d6a5f9 amdgpu: Update SDMA 6.1.0 microcode
67ee1b80 amdgpu: Update GC 11.5.0 microcode
** Affects: linux-firmware (Ubuntu)
Importance: Undecided
Status: New
** Tags: originates-from-2119539
** Tags added: originates-from-2119539
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2119627
Title:
Update GC 11.5.0 microcode
Status in linux-firmware package in Ubuntu:
New
Bug description:
[Impact]
Some frameworks attempt to pre-allocate all available GPU memory, which can
lead to a memory access fault even cause the GPU to hang.
They are solved by the updated microcode for GC 11.5.0.
File
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_core.py",
line 563, in _recv
return recv(self.sock, bufsize)
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/workspace/***/environment/lib/python3.12/site-packages/websocket/_socket.py",
line 132, in recv
raise WebSocketConnectionClosedException("Connection to remote host was
lost.")
websocket._exceptions.WebSocketConnectionClosedException: Connection to
remote host was lost.
HW Exception by GPU node-1 (Agent handle: 0x36616df0) reason :GPU Hang
[Test Plan]
Boot system and ensure no further exception errors or memory faults
[Where problems could occur]
These new AMDGPU FWs are only for gfx1150, which can prevent some unexpected
behavior during intensive modeling tests with gfx1150 allocation.
[Other info]
ba2549fc amdgpu: Update GCN 4.0.5 microcode
61d6a5f9 amdgpu: Update SDMA 6.1.0 microcode
67ee1b80 amdgpu: Update GC 11.5.0 microcode
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2119627/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp