ubuntu@bobone:~$ uname -a Linux bobone 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:40:40 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux ubuntu@bobone:~$ dmesg | grep nouveau | grep error [ 3.020472] nouveau: probe of 0004:04:00.0 failed with error -12 [ 3.020667] nouveau: probe of 0004:05:00.0 failed with error -12 [ 3.021693] nouveau: probe of 0035:03:00.0 failed with error -12 [ 3.022595] nouveau: probe of 0035:04:00.0 failed with error -12 ubuntu@bobone:~$ find /lib/firmware/nvidia/gv100/ -name sw_nonctx.bin find: ‘/lib/firmware/nvidia/gv100/’: No such file or directory ubuntu@bobone:~$ apt-cache policy linux-firmware linux-firmware: Installed: 1.173.3 Candidate: 1.173.3 Version table: *** 1.173.3 500 500 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main ppc64el Packages 500 http://ports.ubuntu.com/ubuntu-ports bionic-security/main ppc64el Packages 100 /var/lib/dpkg/status 1.173 500 500 http://ports.ubuntu.com/ubuntu-ports bionic/main ppc64el Packages ubuntu@bobone:~$
== install linux-firmware from proposed == The nouveau driver does not seems to identify the Nvidia chipset in the 4.15 kernel. But it seems to load the firmware from linux-firmware, and load the driver. ubuntu@bobone:~$ apt-cache policy linux-firmware linux-firmware: Installed: 1.173.5 Candidate: 1.173.5 Version table: *** 1.173.5 500 500 http://ports.ubuntu.com/ubuntu-ports bionic-proposed/main ppc64el Packages 100 /var/lib/dpkg/status 1.173.3 500 500 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main ppc64el Packages 500 http://ports.ubuntu.com/ubuntu-ports bionic-security/main ppc64el Packages 1.173 500 500 http://ports.ubuntu.com/ubuntu-ports bionic/main ppc64el Packages ubuntu@bobone:~$ ubuntu@bobone:~$ find /lib/firmware/nvidia/gv100/ -name sw_nonctx.bin /lib/firmware/nvidia/gv100/gr/sw_nonctx.bin ubuntu@bobone:~$ ubuntu@bobone:~$ tree /lib/firmware/nvidia/gv100/ /lib/firmware/nvidia/gv100/ ├── acr │ ├── bl.bin │ ├── ucode_load.bin │ ├── ucode_unload.bin │ └── unload_bl.bin ├── gr │ ├── fecs_bl.bin │ ├── fecs_data.bin │ ├── fecs_inst.bin │ ├── fecs_sig.bin │ ├── gpccs_bl.bin │ ├── gpccs_data.bin │ ├── gpccs_inst.bin │ ├── gpccs_sig.bin │ ├── sw_bundle_init.bin │ ├── sw_ctx.bin │ ├── sw_method_init.bin │ └── sw_nonctx.bin ├── nvdec │ └── scrubber.bin └── sec2 ├── desc.bin ├── image.bin └── sig.bin 4 directories, 20 files ubuntu@bobone:~$ ubuntu@bobone:~$ dmesg | grep nouveau | grep error [ 157.554971] nouveau: probe of 0004:04:00.0 failed with error -12 [ 157.555041] nouveau: probe of 0004:05:00.0 failed with error -12 [ 157.555957] nouveau: probe of 0035:03:00.0 failed with error -12 [ 157.556078] nouveau: probe of 0035:04:00.0 failed with error -12 ubuntu@bobone:~$ [ 157.554928] nouveau 0004:04:00.0: unknown chipset (140000a1) [ 157.554994] nouveau 0004:05:00.0: unknown chipset (140000a1) [ 157.555880] nouveau 0035:03:00.0: unknown chipset (140000a1) [ 157.556002] nouveau 0035:04:00.0: unknown chipset (140000a1) ubuntu@bobone:~$ lsmod | grep nouveau nouveau 2150398 0 i2c_algo_bit 8831 2 ast,nouveau ttm 247484 2 ast,nouveau drm_kms_helper 209562 2 ast,nouveau drm 503197 5 drm_kms_helper,ast,ttm,nouveau ubuntu@bobone:~$ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-firmware in Ubuntu. https://bugs.launchpad.net/bugs/1794055 Title: [Witherspoon-DD2.2][Ubu 18.10] [4.18.0-7-generic ] OS booting thrown with nouveau errors; OS booted successfully Status in The Ubuntu-power-systems project: Fix Committed Status in linux package in Ubuntu: Won't Fix Status in linux-firmware package in Ubuntu: Fix Released Status in linux source package in Bionic: Won't Fix Status in linux-firmware source package in Bionic: Fix Committed Status in linux source package in Cosmic: Won't Fix Status in linux-firmware source package in Cosmic: Fix Released Bug description: SRU Justification Impact: Missing firmware for nouveau is causing errors to appear in dmesg. Fix: Add missing firmware files from upstream linux-firmware. Test Case: Confirm that errors in dmesg are gone once new firmware files are present. Regression Potential: New and updated firmware always has potential to cause regressions, however this firmware has been in disco for several months with no reported issues. --- == Comment: #0 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-15 23:55:13 == ---Problem Description--- [Witherspoon-DD2.2][Ubu 18.10] [4.18.0-7-generic ] OS booting thrown with nouveau errors Contact Information = kalsh...@in.ibm.com, preeti.tha...@in.ibm.com ---uname output--- root@ltc-wcwsp3:~# uname -a Linux ltc-wcwsp3 4.18.0-7-generic #8-Ubuntu SMP Tue Aug 28 18:20:56 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Witherspoon DD2.2 LC Steps: 1. Netinstall Ubu 18.10 on Witherspoon-LC-DD2.2 6GPU system ------> PASS 2. Boot the OS ---> PASS but error thrown on the console related open source NVIDIA driver. [Disk: sdb2 / c0302064-c5a3-49a7-8bd4-402283e6fcbe] Ubuntu, with Linux 4.18.0-7-generic (recovery mode) Ubuntu, with Linux 4.18.0-7-generic Ubuntu [Disk: nvme0n1p2 / c5d042f1-812e-49e0-94b2-ade477084061] Ubuntu, with Linux 4.18.0-7-generic (recovery mode) * Ubuntu, with Linux 4.18.0-7-generic Ubuntu System information System configuration System status log Language Rescan devices Retrieve config from URL Plugins (0) Exit to shell ?????????????????????????????????????????????????????????????????????????????? Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help The system is going down NOW! Sent SIGTERM to all processes Sent SIGKILL to all processes [ 57.513329] kexec_core: Starting new kernel [ 149.358703978,5] OPAL: Switch to big-endian OS [ 153.355498935,5] OPAL: Switch to little-endian OS [ 2.943735] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 2.943738] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 3.132733] vio vio: uevent: failed to send synthetic uevent [ 4.058698] nouveau 0004:04:00.0: gr: failed to load gr/sw_nonctx [ 4.129215] nouveau 0004:04:00.0: DRM: failed to create kernel channel, -22 [ 19.126509] nouveau 0004:04:00.0: DRM: failed to idle channel 0 [DRM] [ 19.281450] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx [ 19.351322] nouveau 0004:05:00.0: DRM: failed to create kernel channel, -22 [ 34.350509] nouveau 0004:05:00.0: DRM: failed to idle channel 0 [DRM] [ 34.502063] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx [ 34.572144] nouveau 0004:06:00.0: DRM: failed to create kernel channel, -22 [ 49.570509] nouveau 0004:06:00.0: DRM: failed to idle channel 0 [DRM] [ 49.734754] nouveau 0035:03:00.0: gr: failed to load gr/sw_nonctx [ 49.805057] nouveau 0035:03:00.0: DRM: failed to create kernel channel, -22 [ 64.802510] nouveau 0035:03:00.0: DRM: failed to idle channel 0 [DRM] [ 64.955442] nouveau 0035:04:00.0: gr: failed to load gr/sw_nonctx [ 65.025537] nouveau 0035:04:00.0: DRM: failed to create kernel channel, -22 [ 80.022509] nouveau 0035:04:00.0: DRM: failed to idle channel 0 [DRM] [ 80.181169] nouveau 0035:05:00.0: gr: failed to load gr/sw_nonctx [ 80.251481] nouveau 0035:05:00.0: DRM: failed to create kernel channel, -22 [ 95.250509] nouveau 0035:05:00.0: DRM: failed to idle channel 0 [DRM] /dev/nvme0n1p2: recovering journal /dev/nvme0n1p2: clean, 72569/97681408 files, 7384418/390701312 blocks -.mount kmod-static-nodes.service dev-hugepages.mount dev-mqueue.mount sys-kernel-debug.mount ufw.service lvm2-lvmetad.service systemd-remount-fs.service systemd-random-seed.service systemd-sysusers.service keyboard-setup.service systemd-tmpfiles-setup-dev.service lvm2-monitor.service finalrd.service console-setup.service swapfile.swap ebtables.service systemd-udevd.service systemd-journald.service systemd-journal-flush.service systemd-tmpfiles-setup.service systemd-update-utmp.service [ 100.997765] vio vio: uevent: failed to send synthetic uevent systemd-udev-trigger.service systemd-timesyncd.service apparmor.service lvm2-pvscan@8:3.service systemd-modules-load.service sys-kernel-config.mount sys-fs-fuse-connections.mount systemd-sysctl.service ondemand.service dbus.service irqbalance.service opal-prd.service lxcfs.service atd.service cron.service iprdump.service iprinit.service systemd-logind.service iprupdate.service systemd-networkd.service rsyslog.service polkit.service accounts-daemon.service lxd-containers.service networkd-dispatcher.service var-lib-lxcfs.mount tmp-selftest\x2dmountpoint\x2d039055037.mount snapd.service snapd.seeded.service systemd-resolved.service systemd-networkd-wait-online.service blk-availability.service systemd-user-sessions.service apport.service Ubuntu Cosmic Cuttlefish (development branch) ltc-wcwsp3 hvc0 ltc-wcwsp3 login: == Comment: #2 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-16 00:07:26 == sosreport -> http://9.114.13.132/repo/bugs/ubu/sosreport-BZ171506.171506-20180915235600.tar.xz == Comment: #3 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-16 00:33:02 == == Comment: #4 - Praveen K. Pandey <praveen.pan...@in.ibm.com> - 2018-09-19 05:52:23 == facing nouveau related error on power8 system as well [ 4.764818] nouveau 0002:01:00.0: fifo: fault 00 [READ] at 0000000000020000 engine 0c [HOST6] client 06 [GPC0/L1_2] reason 02 [PTE] on channel 0 [03ffb18000 DRM] [ 4.942169] nouveau 000a:01:00.0: fifo: fault 00 [READ] at 0000000000020000 engine 0c [HOST6] client 06 [GPC0/L1_2] reason 02 [PTE] on channel 0 [03ffb18000 DRM] /dev/sdb2: clean, 132397/61054976 files, 5995714/244188416 blocks [ 11.206278] vio vio: uevent: failed to send synthetic uevent [ OK ] Started Show Plymouth Boot Screen. [ OK ] Reached target Local Encrypted Volumes. [ OK ] Started Forward Password Requests to Plymouth Directory Watch. plymouth-start.service [ OK ] Started ebtables ruleset management. == Comment: #5 - Chandni Verma <chand...@in.ibm.com> - 2018-09-20 16:41:49 == --- screening --- From provided dmesg, I notice: 1294 [ 19.281478] nouveau 0004:05:00.0: bios: version 88.00.13.00.02 1295 [ 19.282753] nouveau 0004:05:00.0: Direct firmware load for nvidia/gv100/gr/sw_nonctx.bin failed with error -2 1296 [ 19.282755] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx 1297 [ 19.282813] nouveau 0004:05:00.0: Using 32-bit DMA via iommu .. 1322 [ 34.367713] nouveau 0004:06:00.0: NVIDIA GV100 (140000a1) 1323 [ 34.497152] nouveau 0004:06:00.0: bios: version 88.00.13.00.02 1324 [ 34.502736] nouveau 0004:06:00.0: Direct firmware load for nvidia/gv100/gr/sw_nonctx.bin failed with error -2 1325 [ 34.502738] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx 1326 [ 34.502797] nouveau 0004:06:00.0: Using 32-bit DMA via iommu .. upto 6 instances of the above... Looks like an NVIDIA firmware issue. == Comment: #6 - Luciano Chavez <cha...@us.ibm.com> - 2018-09-20 17:03:31 == (In reply to comment #5) > --- screening --- > > From provided dmesg, I notice: > > > 1294 [ 19.281478] nouveau 0004:05:00.0: bios: version 88.00.13.00.02 > 1295 [ 19.282753] nouveau 0004:05:00.0: Direct firmware load for > nvidia/gv100/gr/sw_nonctx.bin failed with error -2 > 1296 [ 19.282755] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx > 1297 [ 19.282813] nouveau 0004:05:00.0: Using 32-bit DMA via iommu > > .. > > 1322 [ 34.367713] nouveau 0004:06:00.0: NVIDIA GV100 (140000a1) > 1323 [ 34.497152] nouveau 0004:06:00.0: bios: version 88.00.13.00.02 > 1324 [ 34.502736] nouveau 0004:06:00.0: Direct firmware load for > nvidia/gv100/gr/sw_nonctx.bin failed with error -2 > 1325 [ 34.502738] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx > 1326 [ 34.502797] nouveau 0004:06:00.0: Using 32-bit DMA via iommu > > .. > > upto 6 instances of the above... > > > Looks like an NVIDIA firmware issue. Well, I think those message mean that the nouveau module can't find the firmware file as opposed to it being a FW issue. Might be a packaging issue if this is actually not causing any real issues. Probably best to mirror this to Canonical for their comment. == Comment: #10 - Chandni Verma <chand...@in.ibm.com> - 2018-09-24 03:25:35 == To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1794055/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp