Based on a discussion with ~albertomilone, powering down the NVIDIA GPU
while keeping the modules loaded is the way to go long-term as opposed
to blacklisting the modules.

The power management feature is described here (requires Turing GPUs and above):
http://us.download.nvidia.com/XFree86/Linux-x86_64/440.44/README/dynamicpowermanagement.html

My GPU is pre-Turing (Pascal, 1060m), however, powering off is not where
the problem is.

Running `prime-select intel` creates /lib/udev/rules.d/80-pm-
nvidia.rules which contains the following line to unbind an NVIDIA GPU
device from its driver:

https://github.com/tseliot/nvidia-prime/blob/cf757cc9585dfc032930379fc81effb3a3d59606/prime-select#L164-L165
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", 
ATTR{class}=="0x030000", ATTR{remove}="1"

If I comment it out, I can boot just fine with my iGPU after running
`prime-select intel`. The resulting 80-pm-nvidia.rules file looks like
this: https://paste.ubuntu.com/p/HX6t9y8BPg/

Just commenting out the power management lines while leaving the
unbinding in-place results in the same issue (80-pm-nvidia.rules:
https://paste.ubuntu.com/p/mTdXbZZk8H/).

The unbinding operation hangs which results in something like this even
before X11 or gdm3 are attempted to be started:

[   15.683190] nvidia-uvm: Loaded the UVM driver, major device number 511.
[   15.824882] NVRM: Attempting to remove minor device 0 with non-zero usage 
count!
[   15.824903] ------------[ cut here ]------------
[   15.825082] WARNING: CPU: 0 PID: 759 at 
/var/lib/dkms/nvidia/440.59/build/nvidia/nv-pci.c:577 nv_pci_remove+0x338/0x360 
[nvidia]
# ...
[   15.825330] ---[ end trace 353e142c2126a8a0 ]---
# ...
[  242.649248] INFO: task nvidia-persiste:1876 blocked for more than 120 
seconds.
[  242.649931]       Tainted: P        W  O      5.4.0-12-generic #15-Ubuntu
[  242.650618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.651319] nvidia-persiste D    0  1876      1 0x00000004

Eventually it fails with a timeout:
systemd[1]: nvidia-persistenced.service: start operation timed out. Terminating.
systemd[1]: nvidia-persistenced.service: Failed with result 'timeout'.
systemd[1]: Failed to start NVIDIA Persistence Daemon.

Masking nvidia-persistenced via `sudo systemctl mask nvidia-
persistenced` and rebooting shows that systemd-udevd and rmmod hang as
well:

Feb  9 17:18:43 blade systemd-udevd[717]: 0000:01:00.0: Worker [756] processing 
SEQNUM=4430 is taking a long time
Feb  9 17:18:43 blade systemd-udevd[717]: 0000:01:00.1: Worker [746] processing 
SEQNUM=4440 is taking a long time
Feb  9 17:20:43 blade systemd-udevd[717]: 0000:01:00.1: Worker [746] processing 
SEQNUM=4440 killed
Feb  9 17:20:43 blade systemd-udevd[717]: 0000:01:00.0: Worker [756] processing 
SEQNUM=4430 killed
Feb  9 17:21:31 blade kernel: [  242.818665] INFO: task systemd-udevd:746 
blocked for more than 120 seconds.
Feb  9 17:21:31 blade kernel: [  242.819381]       Tainted: P        W  O      
5.4.0-12-generic #15-Ubuntu
Feb  9 17:21:31 blade kernel: [  242.820075] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  9 17:21:31 blade kernel: [  242.820797] systemd-udevd   D    0   746    
717 0x00000324
# ...
Feb  9 17:21:31 blade kernel: [  242.823033] rmmod           D    0  1939   
1937 0x00004000
Feb  9 17:21:31 blade kernel: [  242.823034] Call Trace:
# ...
Feb  9 17:21:31 blade kernel: [  242.823783]  nvkms_close_gpu+0x50/0x80 
[nvidia_modeset]
Feb  9 17:21:31 blade kernel: [  242.823793]  _nv002598kms+0x14d/0x170 
[nvidia_modeset]
# ...
Feb  9 17:21:31 blade kernel: [  242.823893]  ? nv_linux_drm_exit+0x9/0x768 
[nvidia_drm]
Feb  9 17:21:31 blade kernel: [  242.823897]  ? 
__x64_sys_delete_module+0x147/0x290
# ...



** Attachment added: "syslog-nvidia-persistenced-hang.txt"
   
https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1848326/+attachment/5326678/+files/syslog-nvidia-persistenced-hang.txt

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to nvidia-prime in Ubuntu.
https://bugs.launchpad.net/bugs/1848326

Title:
  [cosmic+] error booting with prime-select intel: prime-select does not
  update initramfs to blacklist nvidia modules

Status in nvidia-prime package in Ubuntu:
  Confirmed

Bug description:
  when I try to boot with the iGPU selected, DE won't boot, with nvidia 
selected, everithing is fine.
  I tried uninstalling nvidia driver and it allowed my to access without any 
problems and intel is working fine

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: nvidia-prime 0.8.13
  ProcVersionSignature: Ubuntu 5.3.0-18.19-generic 5.3.1
  Uname: Linux 5.3.0-18-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu8
  Architecture: amd64
  CurrentDesktop: KDE
  Date: Wed Oct 16 12:41:26 2019
  Dependencies:
   
  InstallationDate: Installed on 2019-09-25 (20 days ago)
  InstallationMedia: Kubuntu 19.10 "Eoan Ermine" - Beta amd64 (20190925)
  PackageArchitecture: all
  SourcePackage: nvidia-prime
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-prime/+bug/1848326/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to