Based on a discussion with ~albertomilone, powering down the NVIDIA GPU
while keeping the modules loaded is the way to go long-term as opposed
to blacklisting the modules.

The power management feature is described here (requires Turing GPUs and above):
http://us.download.nvidia.com/XFree86/Linux-x86_64/440.44/README/dynamicpowermanagement.html

My GPU is pre-Turing (Pascal, 1060m), however, powering off is not where
the problem is.

Running `prime-select intel` creates /lib/udev/rules.d/80-pm-
nvidia.rules which contains the following line to unbind an NVIDIA GPU
device from its driver:

https://github.com/tseliot/nvidia-prime/blob/cf757cc9585dfc032930379fc81effb3a3d59606/prime-select#L164-L165
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", 
ATTR{class}=="0x030000", ATTR{remove}="1"

If I comment it out, I can boot just fine with my iGPU after running
`prime-select intel`. The resulting 80-pm-nvidia.rules file looks like
this: https://paste.ubuntu.com/p/HX6t9y8BPg/

Just commenting out the power management lines while leaving the
unbinding in-place results in the same issue (80-pm-nvidia.rules:
https://paste.ubuntu.com/p/mTdXbZZk8H/).

The unbinding operation hangs which results in something like this even
before X11 or gdm3 are attempted to be started:

[   15.683190] nvidia-uvm: Loaded the UVM driver, major device number 511.
[   15.824882] NVRM: Attempting to remove minor device 0 with non-zero usage 
count!
[   15.824903] ------------[ cut here ]------------
[   15.825082] WARNING: CPU: 0 PID: 759 at 
/var/lib/dkms/nvidia/440.59/build/nvidia/nv-pci.c:577 nv_pci_remove+0x338/0x360 
[nvidia]
# ...
[   15.825330] ---[ end trace 353e142c2126a8a0 ]---
# ...
[  242.649248] INFO: task nvidia-persiste:1876 blocked for more than 120 
seconds.
[  242.649931]       Tainted: P        W  O      5.4.0-12-generic #15-Ubuntu
[  242.650618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.651319] nvidia-persiste D    0  1876      1 0x00000004

Eventually it fails with a timeout:
systemd[1]: nvidia-persistenced.service: start operation timed out. Terminating.
systemd[1]: nvidia-persistenced.service: Failed with result 'timeout'.
systemd[1]: Failed to start NVIDIA Persistence Daemon.

Masking nvidia-persistenced via `sudo systemctl mask nvidia-
persistenced` and rebooting shows that systemd-udevd and rmmod hang as
well:

Feb  9 17:18:43 blade systemd-udevd[717]: 0000:01:00.0: Worker [756] processing 
SEQNUM=4430 is taking a long time
Feb  9 17:18:43 blade systemd-udevd[717]: 0000:01:00.1: Worker [746] processing 
SEQNUM=4440 is taking a long time
Feb  9 17:20:43 blade systemd-udevd[717]: 0000:01:00.1: Worker [746] processing 
SEQNUM=4440 killed
Feb  9 17:20:43 blade systemd-udevd[717]: 0000:01:00.0: Worker [756] processing 
SEQNUM=4430 killed
Feb  9 17:21:31 blade kernel: [  242.818665] INFO: task systemd-udevd:746 
blocked for more than 120 seconds.
Feb  9 17:21:31 blade kernel: [  242.819381]       Tainted: P        W  O      
5.4.0-12-generic #15-Ubuntu
Feb  9 17:21:31 blade kernel: [  242.820075] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 17:21:31 blade kernel: [  242.820797] systemd-udevd   D    0   746    
717 0x00000324
# ...
Feb  9 17:21:31 blade kernel: [  242.823033] rmmod           D    0  1939   
1937 0x00004000
Feb  9 17:21:31 blade kernel: [  242.823034] Call Trace:
# ...
Feb  9 17:21:31 blade kernel: [  242.823783]  nvkms_close_gpu+0x50/0x80 
[nvidia_modeset]
Feb  9 17:21:31 blade kernel: [  242.823793]  _nv002598kms+0x14d/0x170 
[nvidia_modeset]
# ...
Feb  9 17:21:31 blade kernel: [  242.823893]  ? nv_linux_drm_exit+0x9/0x768 
[nvidia_drm]
Feb  9 17:21:31 blade kernel: [  242.823897]  ? 
__x64_sys_delete_module+0x147/0x290
# ...



** Attachment added: "syslog-nvidia-persistenced-hang.txt"
   
https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1848326/+attachment/5326678/+files/syslog-nvidia-persistenced-hang.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1848326

Title:
  [cosmic+] error booting with prime-select intel: prime-select does not
  update initramfs to blacklist nvidia modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1848326/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to