Fix patches sent to kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2024-December/155871.html.

SRU Justification

[Impact]

On a system with a GV100 GPU using the nouveau driver, the display becomes
unresponsive and a storm of "nouveau 0000:07:00.0: disp: ctrl 00000080"
messages are continuously printed to dmesg once the desktop environment reaches
its idle timeout. This is interfering with certification testing for the DGX
Station desktop system, as the system eventually will become unresponsive
during testing.

[Fix]

This only affects Focal.

Backporting the following patches from K5.6 resolves the issue:
58ae5284f6 ("drm/nouveau/disp/gv100-: halt 
NV_PDISP_FE_RM_INTR_STAT_CTRL_DISP_ERROR storms")
5bb88d0794 ("drm/nouveau/kms/gv100-: move window ownership setup into 
modesetting path")
137c4ba716 ("drm/nouveau/kms/gv100-: avoid sending a core update until the 
first modeset")

[Test Case]

1. Install desktop environment
$ sudo apt install ubuntu-desktop

2. Configure GDM
$ sudo vim /etc/gdm3/custom.conf
  => Uncomment WaylandEnable=false
  => Configure automatic login for the `ubuntu` user by setting
        AutomaticLoginEnable = true
        AutomaticLogin = ubuntu

3. Disable display timeout
$ gsettings set org.gnome.desktop.session idle-delay 0

4. Set graphical as the default target
$ sudo systemctl set-default graphical.target

5. Reboot the system

6. Enable 1 second display timeout and wait ~10 seconds
$ gsettings set org.gnome.desktop.session idle-delay 1

7. Observe that after applying these patches, the display can wake up from idle
and the system continues to be usable without a storm of "nouveau 0000:07:00.0:
disp: ctrl 00000080" messages in dmesg.

[Where things could go wrong]

These changes affect only the nouveau driver. Issues would appear as
misbehavior of the nouveau driver, mostly likely for Volta NVIDIA GPUs.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2078011

Title:
  nouveau keeps showing `disp: ctrl 00000080` and crippling the system

Status in linux package in Ubuntu:
  Invalid
Status in xserver-xorg-video-nouveau package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  In Progress
Status in xserver-xorg-video-nouveau source package in Focal:
  Invalid

Bug description:
  During the kerenl SRU testing, I found that the DGX A100 station kept
  showing error messages from nouveau in dmesg as follows. These
  numerous kernel error messages crippled the system and made it
  unresponsive.

  [ 2265.721452] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721457] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721463] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721474] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721480] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721485] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721491] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721496] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721507] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721514] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721519] nouveau 0000:07:00.0: disp: ctrl 00000080
  [ 2265.721525] nouveau 0000:07:00.0: disp: ctrl 00000080

  When the system reaches the idle delay, I guess the system will try to turn 
off the monitor then something went wrong.
  I can quickly reproduce this by setting idle-delay to 1 sec after the system 
boot into desktop.
  `gsettings set org.gnome.desktop.session idle-delay 0`

  The impacted system is https://ubuntu.com/certified/201711-25989

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: xserver-xorg-video-nouveau 1:1.0.16-1
  ProcVersionSignature: Ubuntu 5.4.0-193.213-generic 5.4.278
  Uname: Linux 5.4.0-193-generic x86_64
  ApportVersion: 2.20.11-0ubuntu27.4
  Architecture: amd64
  CasperMD5CheckResult: skip
  Date: Tue Aug 27 20:31:20 2024
  DistUpgraded: Fresh install
  DistroCodename: focal
  DistroVariant: ubuntu
  ExtraDebuggingInterest: Yes
  InstallationDate: Installed on 2020-08-03 (1485 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: NVIDIA DGX Station
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-193-generic 
root=UUID=88df95a6-4fd9-475a-8b59-ad14df1ada5a ro
  SourcePackage: xserver-xorg-video-nouveau
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/27/2018
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 0406
  dmi.board.asset.tag: Default string
  dmi.board.name: X99-E-10G WS
  dmi.board.vendor: EMPTY
  dmi.board.version: Rev 1.xx
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: EMPTY
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr0406:bd08/27/2018:svnNVIDIA:pnDGXStation:pvrSystemVersion:rvnEMPTY:rnX99-E-10GWS:rvrRev1.xx:cvnEMPTY:ct3:cvrDefaultstring:
  dmi.product.family: DGX
  dmi.product.name: DGX Station
  dmi.product.sku: 920-22587-2510-000
  dmi.product.version: System Version
  dmi.sys.vendor: NVIDIA
  version.compiz: compiz N/A
  version.libdrm2: libdrm2 2.4.101-2
  version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
  version.libgl1-mesa-glx: libgl1-mesa-glx N/A
  version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.2
  version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
  version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
  version.xserver-xorg-video-intel: xserver-xorg-video-intel 
2:2.99.917+git20200226-1
  version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2078011/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to