[Kernel-packages] [Bug 1836635] [NEW] Bionic: support for Solarflare X2542 network adapter (sfc driver)
Public bug reported: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential for regression definitely exists. It has been tested on other adapter which uses the old code, and no regressions were found. ** Affects: linux (Ubuntu) Importance: Undecided Status: Invalid ** Affects: linux (Ubuntu Bionic) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: In Progress ** Affects: linux (Ubuntu Cosmic) Importance: Undecided Status: Invalid ** Affects: linux (Ubuntu Disco) Importance: Undecided Status: Invalid ** Affects: linux (Ubuntu Eoan) Importance: Undecided Status: Invalid ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Status: New => In Progress ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Cosmic) Status: New => Invalid ** Changed in: linux (Ubuntu Disco) Status: New => Invalid ** Changed in: linux (Ubuntu Eoan) Status: New => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1836635 Title: Bionic: support for Solarflare X2542 network adapter (sfc driver) Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Invalid Status in linux source package in Eoan: Invalid Bug description: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential for regression definitely exists. It has been tested on other adapter which uses the old code, and no regressions were found. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)
** Description changed: [Impact] - * Support for Solarflare X2542 network adapter -(Medford2 / SFC9250) in the sfc driver. + * Support for Solarflare X2542 network adapter + (Medford2 / SFC9250) in the sfc driver. - * This network adapter is present on recent hardware, -at least HP 2019 and Dell PowerEdge R740xd systems. + * This network adapter is present on recent hardware, + at least HP 2019 and Dell PowerEdge R740xd systems. - * On recent-hardware deployments that would rather use -the Bionic LTS / GA supported kernel and cannot move -to HWE kernels this adapter is non functional at all. + * On recent-hardware deployments that would rather use + the Bionic LTS / GA supported kernel and cannot move + to HWE kernels this adapter is non functional at all. [Test Case] - * The X2542 adapter has been exercised with iperf3 and nc -across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 -on both directions, for 1 week. + * The X2542 adapter has been exercised with iperf3 and nc + across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 + on both directions, for 1 week. -Its performance is on par with the Cosmic 4.18 kernel -(which contains all these patches) and the out-of-tree -driver from the vendor. + Its performance is on par with the Cosmic 4.18 kernel + (which contains all these patches) and the out-of-tree + driver from the vendor. - * The 7000 series adapter (for regression testing an old model, -supported previously) has been exercised with iperf and netperf -(TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one -host (client/server in different adapter ports isolated with -network namespaces, so traffic goes through the network switch), -on 10G link speed on MTUs 1500/9000, for 1 weekend. + * The 7000 series adapter (for regression testing an old model, + supported previously) has been exercised with iperf and netperf + (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one + host (client/server in different adapter ports isolated with + network namespaces, so traffic goes through the network switch), + on 10G link speed on MTUs 1500/9000, for 1 weekend. -No regressions observed between the original and test kernels. + No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential -for regression definitely exists. It has been tested on other -adapter which uses the old code, and no regressions were found. +for regression definitely exists. Thus, a lot of consideration +and testing happened: + + * It has been tested on other adapter which uses the old code, +and no regressions were found so far (see 7000 series above). + + * The patchset essentially moves the driver in Bionic up in the +upstream 'git log': +- since commit d4a7a8893d4c ("sfc: pass valid pointers from efx_enqueue_unwind") +- until commit 7f61e6c6279b ("sfc: support FEC configuration through ethtool") +- except for 2 commits (not needed / unrelated) + - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") + - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of LOOPBACK_SGMII") +- plus 2 more recent commits (fixes) + - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats") + - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new buffers") + + * The patchset is exclusively cherry-picks, no single backport. ** Description changed: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] - * The patchset touches a lot of the sfc driver, so the potential -for regression definitely exists. Thus, a l
[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)
[Bionic][PULL] sfc: patches for LP#1836635 https://lists.ubuntu.com/archives/kernel-team/2019-July/102196.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1836635 Title: Bionic: support for Solarflare X2542 network adapter (sfc driver) Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Invalid Status in linux source package in Eoan: Invalid Bug description: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the Bionic sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential for regression definitely exists. Thus, a lot of consideration and testing happened: * It has been tested on other adapter which uses the old code, and no regressions were found so far (see 7000 series above). * The patchset is exclusively cherry-picks, no single backport. * The patchset essentially moves the Bionic driver up in the upstream 'git log --oneline -- drivers/net/ethernet/sfc/': - since commit d4a7a8893d4c ("sfc: pass valid pointers from efx_enqueue_unwind") - until commit 7f61e6c6279b ("sfc: support FEC configuration through ethtool") - except for 2 commits (not needed / unrelated) - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of LOOPBACK_SGMII") - plus 2 more recent commits (fixes) - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats") - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new buffers") To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)
Regression test results/log/script, for documentation purposes. ** Attachment added: "lp1836635-test-regression.tar.xz" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+attachment/5277232/+files/lp1836635-test-regression.tar.xz -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1836635 Title: Bionic: support for Solarflare X2542 network adapter (sfc driver) Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Invalid Status in linux source package in Eoan: Invalid Bug description: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the Bionic sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential for regression definitely exists. Thus, a lot of consideration and testing happened: * It has been tested on other adapter which uses the old code, and no regressions were found so far (see 7000 series above). * The patchset is exclusively cherry-picks, no single backport. * The patchset essentially moves the Bionic driver up in the upstream 'git log --oneline -- drivers/net/ethernet/sfc/': - since commit d4a7a8893d4c ("sfc: pass valid pointers from efx_enqueue_unwind") - until commit 7f61e6c6279b ("sfc: support FEC configuration through ethtool") - except for 2 commits (not needed / unrelated) - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of LOOPBACK_SGMII") - plus 2 more recent commits (fixes) - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats") - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new buffers") To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
The documentation about the feature/parameters has been merged [1] into Server Guide -> Installation -> Advanced Installaion -> iSCSI [2]. It's not yet published right now, so the HTML might be updated later. cheers, Mauricio [1] https://code.launchpad.net/~mfo/serverguide/ibft/+merge/370264 [2] https://help.ubuntu.com/lts/serverguide/advanced-installation.html#iscsi -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Fix Released Status in hw-detect source package in Bionic: Fix Released Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Fix Released Status in debian-installer source package in Cosmic: Fix Released Status in hw-detect source package in Cosmic: Fix Released Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Fix Released Status in debian-installer source package in Disco: Fix Released Status in hw-detect source package in Disco: Fix Released Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Fix Released Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
The verification for bionic/cosmic -proposed is expected to finish by tomorrow (Apr 12). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Bug description: [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
Verification successful with xfstests on nfs+fscache. No regression in cosmic-proposed from cosmic-updates. cosmic-updates / 4.18.0-17: Failures: generic/035 generic/258 generic/294 generic/448 generic/467 generic/477 generic/484 generic/490 generic/495 Failed 9 of 437 tests cosmic-proposed / 4.18.0-18: Failures: generic/035 generic/258 generic/294 generic/448 generic/467 generic/477 generic/484 generic/490 generic/495 Failed 9 of 437 tests -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Bug description: [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
Verification successful with xfstests on nfs+fscache. No regression in bionic-proposed from bionic-updates. bionic-updates / 4.15.0-47: Failures: generic/035 generic/075 generic/091 generic/112 generic/263 generic/294 generic/306 generic/307 generic/430 generic/431 generic/434 generic/469 generic/484 generic/495 Failed 14 of 437 tests bionic-proposed / 4.15.0-48: Failures: generic/035 generic/075 generic/091 generic/112 generic/263 generic/294 generic/306 generic/307 generic/430 generic/431 generic/434 generic/469 generic/484 generic/495 Failed 14 of 437 tests ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Bug description: [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
Regression testing setup/steps === fscache --- sudo apt-get -y install cachefilesd echo 'RUN=yes' | sudo tee -a /etc/default/cachefilesd sudo modprobe fscache sudo systemctl start cachefilesd nfs --- sudo apt-get -y install nfs-kernel-server sudo systemctl start nfs-kernel-server sudo mkdir -p /{srv,mnt}/nfs-{test,scratch} # different fsid if in the same local filesystem echo '/srv/nfs-test127.0.0.1(rw,no_subtree_check,no_root_squash,fsid=0)' | sudo tee -a /etc/exports echo '/srv/nfs-scratch 127.0.0.1(rw,no_subtree_check,no_root_squash,fsid=1)' | sudo tee -a /etc/exports sudo exportfs -ra xfs-tests - sudo apt-get -y install automake gcc make git xfsprogs xfslibs-dev \ uuid-dev uuid-runtime libtool-bin e2fsprogs libuuid1 attr libattr1-dev \ libacl1-dev libaio-dev libgdbm-dev quota gawk fio dbench python sqlite3 git clone https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git cd xfstests-dev git log --oneline -1 HEAD f3c1bca generic: Test that SEEK_HOLE can find a punched hole make -j$(nproc); echo $? # must be 0 sudo useradd fsgqa sudo groupadd fsgqa sudo useradd 123456-fsgqa export TEST_DEV=127.0.0.1:/srv/nfs-test export TEST_DIR=/mnt/nfs-test export SCRATCH_DEV=127.0.0.1:/srv/nfs-scratch export SCRATCH_MNT=/mnt/nfs-scratch export TEST_FS_MOUNT_OPTS="-o fsc" # for fscache / test dev export NFS_MOUNT_OPTIONS="-o fsc" # for fscache / scratch dev cd ~/xfstests-dev sudo -E ./check -nfs -g quick 2>&1 | tee ~/xfs-tests.nfs.log.$(uname -r) <...> --- In another terminal, check the NFS mounts are indeed with the 'fsc' (fscache) attribute: $ mount | grep nfs | grep fsc 127.0.0.1:/srv/nfs-test on /mnt/nfs-test type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,fsc,local_lock=none,addr=127.0.0.1) 127.0.0.1:/srv/nfs-scratch on /mnt/nfs-scratch type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,fsc,local_lock=none,addr=127.0.0.1) And compare fscache stats before/after run: $ cat /proc/fs/fscache/stats FS-Cache statistics Cookies: idx=0 dat=0 spc=0 Objects: alc=0 nal=0 avl=0 ded=0 ChkAux : non=0 ok=0 upd=0 obs=0 Pages : mrk=0 unc=0 Acquire: n=0 nul=0 noc=0 ok=0 nbf=0 oom=0 Lookups: n=0 neg=0 pos=0 crt=0 tmo=0 Invals : n=0 run=0 Updates: n=0 nul=0 run=0 Relinqs: n=0 nul=0 wcr=0 rtr=0 AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 Allocs : n=0 ok=0 wt=0 nbf=0 int=0 Allocs : ops=0 owt=0 abt=0 Retrvls: n=0 ok=0 wt=0 nod=0 nbf=0 int=0 oom=0 Retrvls: ops=0 owt=0 abt=0 Stores : n=0 ok=0 agn=0 nbf=0 oom=0 Stores : ops=0 run=0 pgs=0 rxd=0 olm=0 VmScan : nos=0 gon=0 bsy=0 can=0 wt=0 Ops: pend=0 run=0 enq=0 can=0 rej=0 Ops: ini=0 dfr=0 rel=0 gc=0 CacheOp: alo=0 luo=0 luc=0 gro=0 CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0 CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 ... $ cat /proc/fs/fscache/stats FS-Cache statistics Cookies: idx=412 dat=2441632 spc=0 Objects: alc=8929 nal=0 avl=8741 ded=8928 ChkAux : non=0 ok=86 upd=0 obs=1123 Pages : mrk=371441 unc=371441 Acquire: n=2442044 nul=0 noc=0 ok=2442044 nbf=0 oom=0 Lookups: n=8929 neg=8817 pos=112 crt=8817 tmo=0 Invals : n=152 run=152 Updates: n=0 nul=0 run=152 Relinqs: n=2442044 nul=0 wcr=0 rtr=0 AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 Allocs : n=0 ok=0 wt=0 nbf=0 int=0 Allocs : ops=0 owt=0 abt=0 Retrvls: n=1498 ok=0 wt=195 nod=1498 nbf=0 int=0 oom=0 Retrvls: ops=1498 owt=575 abt=0 Stores : n=371145 ok=371145 agn=0 nbf=0 oom=0 Stores : ops=1117 run=372234 pgs=371118 rxd=371118 olm=0 VmScan : nos=49 gon=0 bsy=0 can=0 wt=0 Ops: pend=575 run=2767 enq=372387 can=0 rej=0 Ops: ini=372795 dfr=37 rel=372795 gc=37 CacheOp: alo=0 luo=0 luc=0 gro=0 CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0 CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 CacheEv: nsp=1123 stl=0 rtr=0 cul=0 --- Note, in 4.15.0 kernels, some tests apparently run forever: generic/430, 431 and 434 (same behavior in nfs+fscache, ext4, xfs), they were killed with 'sudo kill -TERM $(pidof xfs_io)'. # ref: https://wiki.linux-nfs.org/wiki/index.php/Xfstests ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Bug description: [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case,
[Kernel-packages] [Bug 1824827] [NEW] tasks doing write()/fsync() hit deadlock in write_cache_pages()
Public bug reported: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and ext4 (reporter's environment + Ubuntu's default). No regressions were observed (the set of failed tests is the same in each system and tests failed in the same way). * This has also been verified with 'iozone' for write intensive tests, to exercise the writeback mechanism and no errors were observed. * The reporter has been running the test kernel with the patch for weeks and has not observed any other issues/regressions. [Other Info] * This is only required in Cosmic (for the Bionic HWE kernel), and is already applied in Disco. ** Affects: linux (Ubuntu) Importance: Undecided Status: Invalid ** Affects: linux (Ubuntu Cosmic) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: Confirmed ** Affects: linux (Ubuntu Disco) Importance: Undecided Status: Invalid ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Disco) Status: New => Invalid ** Changed in: linux (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: linux (Ubuntu Cosmic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel mod
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
testcase, kernel part. ** Attachment added: "kprobe-test.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255994/+files/kprobe-test.c -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and ext4 (reporter's environment + Ubuntu's default). No regressions were observed (the set of failed tests is the same in each system and tests failed in the same way). * This has also been verified with 'iozone' for write intensive tests, to exercise the writeback mechanism and no errors were observed. * The reporter has been running the test kernel with the patch for weeks and has not observed any other issues/regressions. [Other Info] * This is only required in Cosmic (for the Bionic HWE kernel), and is already applied in Disco. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
testcase, userspace part. ** Attachment added: "test.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255995/+files/test.c -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and ext4 (reporter's environment + Ubuntu's default). No regressions were observed (the set of failed tests is the same in each system and tests failed in the same way). * This has also been verified with 'iozone' for write intensive tests, to exercise the writeback mechanism and no errors were observed. * The reporter has been running the test kernel with the patch for weeks and has not observed any other issues/regressions. [Other Info] * This is only required in Cosmic (for the Bionic HWE kernel), and is already applied in Disco. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
testcase original kernel :: latest cosmic version: $ uname -rv 4.18.0-18-generic #19-Ubuntu SMP Tue Apr 2 18:13:16 UTC 2019 [ 654.491029] kprobe_test: loading out-of-tree module taints kernel. [ 654.493322] kprobe_test: module verification failed: signature and/or required key missing - tainting kernel [ 654.497033] mod_init():158 :: hello [ 654.497976] mod_init():183 :: kernel version: orig/-18/cosmic [ 694.254271] Program running, TID = 3292 [ 694.256600] kp1_pre_handler():070 :: state 0 :: pid = 3292, mapping = 0x962333263730, comm = 'test' [ 694.260870] kp1_pre_handler():079 :: state 0 -> 1 :: pid = 3292, mapping = 0x962333263730 [ 694.262710] kp2_pre_handler():119 :: state 1 :: pid = 3292, page index = 1 [ 694.264264] kp3_pre_handler():144 :: state 1 :: pid = 3292, page index = 1, calling writepage() [ 694.266641] kp2_pre_handler():119 :: state 1 :: pid = 3292, page index = 2 [ 694.268456] kp3_pre_handler():144 :: state 1 :: pid = 3292, page index = 2, calling writepage() [ 695.276320] Thread 0 running, TID = 3293! [ 695.281210] kp1_pre_handler():070 :: state 1 :: pid = 1165, mapping = 0x962333263730, comm = 'kworker/u4:2' [ 695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 1165, mapping = 0x962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2. [ 695.314808] kp2_pre_handler():119 :: state 2 :: pid = 1165, page index = 2 [ 695.322822] kp3_pre_handler():144 :: state 2 :: pid = 1165, page index = 2, calling writepage() [ 695.330308] kp2_pre_handler():119 :: state 2 :: pid = 1165, page index = 1 [ 695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 1165, page index = 1, spin 5 seconds before lock_page()... [ 696.283747] Thread 1 running, TID = 3295! [ 696.284623] kp1_pre_handler():070 :: state 3 :: pid = 3295, mapping = 0x962333263730, comm = 'test' [ 696.286726] kp2_pre_handler():119 :: state 3 :: pid = 3295, page index = 1 [ 696.288392] kp3_pre_handler():144 :: state 3 :: pid = 3295, page index = 1, calling writepage() [ 696.290018] kp2_pre_handler():119 :: state 3 :: pid = 3295, page index = 2 [ 697.283941] Thread 2 running, TID = 3296! [ 697.284859] kp1_pre_handler():070 :: state 3 :: pid = 3296, mapping = 0x962333263730, comm = 'test' [ 697.287246] kp2_pre_handler():119 :: state 3 :: pid = 3296, page index = 1 [ 700.302756] kp2_pre_handler():127 :: state 3 -> 4 :: pid = 1165, page index = 1, spun 5 seconds before lock_page(). [ 715.716717] INFO: task kworker/u4:2:1165 blocked for more than 10 seconds. [ 715.725486] Tainted: G OE 4.18.0-18-generic #19-Ubuntu [ 715.732832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 715.740500] kworker/u4:2D0 1165 2 0x8000 [ 715.745615] Workqueue: writeback wb_workfn (flush-7:1) [ 715.750736] Call Trace: [ 715.753270] __schedule+0x29e/0x840 [ 715.756493] schedule+0x2c/0x80 [ 715.759369] io_schedule+0x16/0x40 [ 715.762044] __lock_page+0x101/0x150 [ 715.764729] ? page_cache_tree_insert+0xe0/0xe0 [ 715.773625] write_cache_pages+0x283/0x4e0 [ 715.782547] ? xfs_vm_readpage+0x80/0x80 [xfs] [ 715.792525] ? xfs_vm_readpage+0x80/0x80 [xfs] [ 715.798175] ? write_cache_pages+0x5/0x4e0 [ 715.803180] xfs_vm_writepages+0x6b/0xa0 [xfs] [ 715.807087] do_writepages+0x41/0xd0 [ 715.810416] __writeback_single_inode+0x40/0x360 [ 715.813588] ? fprop_fraction_percpu+0x26/0x80 [ 715.816686] writeback_sb_inodes+0x211/0x520 [ 715.819584] __writeback_inodes_wb+0x67/0xb0 [ 715.822661] wb_writeback+0x25f/0x2f0 [ 715.824963] ? get_nr_dirty_inodes+0x46/0x70 [ 715.827180] wb_workfn+0x175/0x3f0 [ 715.829225] process_one_work+0x20f/0x410 [ 715.830964] worker_thread+0x34/0x400 [ 715.832646] kthread+0x120/0x140 [ 715.834551] ? pwq_unbound_release_workfn+0xd0/0xd0 [ 715.836902] ? kthread_bind+0x40/0x40 [ 715.838772] ret_from_fork+0x35/0x40 [ 715.840579] INFO: task test:3293 blocked for more than 10 seconds. [ 715.842927] Tainted: G OE 4.18.0-18-generic #19-Ubuntu [ 715.845279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 715.847771] testD0 3293 3291 0x [ 715.849826] Call Trace: [ 715.851266] __schedule+0x29e/0x840 [ 715.853289] schedule+0x2c/0x80 [ 715.855069] wb_wait_for_completion+0x64/0x90 [ 715.857179] ? wait_woken+0x80/0x80 [ 715.858973] sync_inodes_sb+0xc7/0x290 [ 715.860754] sync_inodes_one_sb+0x15/0x20 [ 715.862383] iterate_supers+0xaa/0x100 [ 715.863938] ? default_file_splice_write+0x30/0x30 [ 715.865666] ksys_sync+0x42/0xb0 [ 715.867082] __ia32_sys_sync+0xe/0x20 [ 715.869122] do_syscall_64+0x5a/0x110 [ 715.871041] entry_SYSCALL_64_after_hw
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
testcase test kernel :: latest cosmic version + patch: $ uname -rv 4.18.0-18-generic #19+test20190415b1 SMP Mon Apr 15 15:43:20 UTC 2019 [ 169.145212] kprobe_test: loading out-of-tree module taints kernel. [ 169.149144] kprobe_test: module verification failed: signature and/or required key missing - tainting kernel [ 169.153539] mod_init():158 :: hello [ 169.154744] mod_init():190 :: kernel version: test/-18/cosmic [ 169.177027] Program running, TID = 2497 [ 169.177978] kp1_pre_handler():070 :: state 0 :: pid = 2497, mapping = 0x993df9136fb0, comm = 'test' [ 169.181080] kp1_pre_handler():079 :: state 0 -> 1 :: pid = 2497, mapping = 0x993df9136fb0 [ 169.183355] kp2_pre_handler():119 :: state 1 :: pid = 2497, page index = 1 [ 169.185616] kp3_pre_handler():144 :: state 1 :: pid = 2497, page index = 1, calling writepage() [ 169.187779] kp2_pre_handler():119 :: state 1 :: pid = 2497, page index = 2 [ 169.189186] kp3_pre_handler():144 :: state 1 :: pid = 2497, page index = 2, calling writepage() [ 170.194880] Thread 0 running, TID = 2498! [ 170.200011] kp1_pre_handler():070 :: state 1 :: pid = 7, mapping = 0x993df9136fb0, comm = 'kworker/u4:0' [ 170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 7, mapping = 0x993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2. [ 170.238633] kp2_pre_handler():119 :: state 2 :: pid = 7, page index = 2 [ 170.248024] kp3_pre_handler():144 :: state 2 :: pid = 7, page index = 2, calling writepage() [ 170.261141] kp1_pre_handler():070 :: state 2 :: pid = 7, mapping = 0x993df9136fb0, comm = 'kworker/u4:0' [ 170.272150] kp2_pre_handler():119 :: state 2 :: pid = 7, page index = 1 [ 170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()... [ 171.195090] Thread 1 running, TID = 2499! [ 171.196182] kp1_pre_handler():070 :: state 3 :: pid = 2499, mapping = 0x993df9136fb0, comm = 'test' [ 171.198609] kp2_pre_handler():119 :: state 3 :: pid = 2499, page index = 1 [ 171.200358] kp3_pre_handler():144 :: state 3 :: pid = 2499, page index = 1, calling writepage() [ 171.203717] kp2_pre_handler():119 :: state 3 :: pid = 2499, page index = 2 [ 172.195297] Thread 2 running, TID = 2500! [ 172.196387] kp1_pre_handler():070 :: state 3 :: pid = 2500, mapping = 0x993df9136fb0, comm = 'test' [ 172.198673] kp2_pre_handler():119 :: state 3 :: pid = 2500, page index = 1 [ 175.252161] kp2_pre_handler():127 :: state 3 -> 4 :: pid = 7, page index = 1, spun 5 seconds before lock_page(). [ 175.254922] kp3_pre_handler():144 :: state 4 :: pid = 2499, page index = 2, calling writepage() [ 175.256849] kp3_pre_handler():144 :: state 4 :: pid = 2500, page index = 1, calling writepage() [ 175.259166] kp2_pre_handler():119 :: state 4 :: pid = 2500, page index = 2 [ 175.273178] mod_exit():213 :: bye -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
Patch posted for SRU: [C][PATCH 0/1] Fix write()/fsync() deadlock in write_cache_pages() https://lists.ubuntu.com/archives/kernel-team/2019-April/100084.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and ext4 (reporter's environment + Ubuntu's default). No regressions were observed (the set of failed tests is the same in each system and tests failed in the same way). * This has also been verified with 'iozone' for write intensive tests, to exercise the writeback mechanism and no errors were observed. * The reporter has been running the test kernel with the patch for weeks and has not observed any other issues/regressions. [Other Info] * This is only required in Cosmic (for the Bionic HWE kernel), and is already applied in Disco. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
The change introduced by the patch is evident in the kernel message log for Thread 0: between page indexes 2 and 1 there's now another function call to write_cache_pages() instead of just another iteration of the for-loop inside one call. Original kernel: [ 695.276320] Thread 0 running, TID = 3293! [ 695.281210] kp1_pre_handler():070 :: state 1 :: pid = 1165, mapping = 0x962333263730, comm = 'kworker/u4:2' [ 695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 1165, mapping = 0x962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2. [ 695.314808] kp2_pre_handler():119 :: state 2 :: pid = 1165, page index = 2 [ 695.322822] kp3_pre_handler():144 :: state 2 :: pid = 1165, page index = 2, calling writepage() << ... HERE ... >> [ 695.330308] kp2_pre_handler():119 :: state 2 :: pid = 1165, page index = 1 [ 695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 1165, page index = 1, spin 5 seconds before lock_page()... Test kernel: [ 170.194880] Thread 0 running, TID = 2498! [ 170.200011] kp1_pre_handler():070 :: state 1 :: pid = 7, mapping = 0x993df9136fb0, comm = 'kworker/u4:0' [ 170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 7, mapping = 0x993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2. [ 170.238633] kp2_pre_handler():119 :: state 2 :: pid = 7, page index = 2 [ 170.248024] kp3_pre_handler():144 :: state 2 :: pid = 7, page index = 2, calling writepage() [ 170.261141] kp1_pre_handler():070 :: state 2 :: pid = 7, mapping = 0x993df9136fb0, comm = 'kworker/u4:0' [ 170.272150] kp2_pre_handler():119 :: state 2 :: pid = 7, page index = 1 [ 170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()... ** Changed in: linux (Ubuntu Cosmic) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: In Progress Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and
[Kernel-packages] [Bug 1839521] Re: Xenial: ZFS deadlock in shrinker path with xattrs
Hi @mathew-hodson, This fix is needed in the Linux kernel package for Xenial as well (it duplicates the zfs-linux/dkms source). Shouldn't the 'linux' source package still be tracked? Thanks, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1839521 Title: Xenial: ZFS deadlock in shrinker path with xattrs Status in zfs-linux package in Ubuntu: Fix Released Status in zfs-linux source package in Xenial: Fix Committed Bug description: [Impact] * Xenial's ZFS can deadlock in the memory shrinker path after removing files with extended attributes (xattr). * Extended attributes are enabled by default, but are _not_ used by default, which reduces the likelyhood. * It's very difficult/rare to reproduce this problem, due to file/xattr/remove/shrinker/lru order/timing circumstances required. (weeks for a reporter user) but a synthetic test-case has been found for tests. [Test Case] * A synthetic reproducer is available for this LP, with a few steps to touch/setfattr/rm/drop_caches plus a kernel module to massage the disposal list. (comment #8) * In the original ZFS module: the xattr dir inode is not purged immediately on file removal, but possibly purged _two_ shrinker invocations later. This allows for other thread started before file remove to call zfs_zget() on the xattr child inode and iput() it, so it makes to the same disposal list as the xattr dir inode. (comment #3) * In the modified ZFS module: the xattr dir inode is purged immediately on file removal not possibly later on shrinker invocation, so the problem window above doesn't exist anymore. (comment #12) [Regression Potential] * Low. The patches are confined to extended attributes in ZFS, specifically node removal/purge, and another change how an xattr child inode tracks its xattr dir (parent) inode, so that it can be purged immediately on removal. * The ZFS test-suite has been run on original/modified zfs-dkms package/kernel modules, with no regressions. (comment #11) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device
Verification successful with disco-proposed. No warning nor oops anymore. # rmadison -s disco-proposed linux linux | 5.0.0-26.27 | disco-proposed | source # uname -rv 5.0.0-26-generic #27-Ubuntu SMP Tue Aug 13 17:47:39 UTC 2019 # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 [ 171.879953] bcache: register_bdev() registered backing device loop0 [ 171.920116] bcache: run_cache_set() invalidating existing data [ 171.931843] bcache: register_cache() registered cache device loop1 [ 175.906911] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 18fa9221-da9c-4e69-8b23-6eb093030c30 # reboot # # comment last line in script. # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 [ 91.990987] bcache: register_bdev() registered backing device loop0 [ 94.001825] bcache: run_cache_set() invalidating existing data [ 94.018920] bcache: register_cache() registered cache device loop1 # sleep 10 # ** Tags removed: verification-needed-disco ** Tags added: verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837788 Title: bcache kernel warning when attaching device Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Invalid Bug description: [Impact] * Users can get a Warning or even Oops the kernel if bcache/writeback_percent is set before attaching a caching device to the bcache device. * The fix is trivial, upstream, and consists of just checking whether the caching device is attached in order to set flags and schedule thread (which oops). [Test Case] * See attachment 'setup-bcache-wb_percent-before-attach.sh' used in comment #5 and #6 to reproduce the problem(s). * for 'Warning': # make-bcache -B # make-bcache -C # echo 11 > /sys/block//bcache/writeback_percent # sleep 1 # echo > /sys/block//bcache/attach * for 'Oops': (steps above, but don't run last command / 'attach'). [Regression Potential] * Low. The fix is trivial, contained, and exclusive to bcache sysfs handler. * The modified path has been exercised with synthetic testing (script). [Original Bug Description] See attached dmesg, each time this server is rebooted it emits a concerning bcache warning. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-54-generic 4.15.0-54.58 ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18 Uname: Linux 4.15.0-54-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-54-generic. AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Date: Wed Jul 24 12:28:06 2019 InstallationDate: Installed on 2013-10-04 (2119 days ago) InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 (20130925.1) MachineType: Supermicro X9DAi ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb RelatedPackageVersions: linux-restricted-modules-4.15.0-54-generic N/A linux-backports-modules-4.15.0-54-generic N/A linux-firmware 1.173.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago) dmi.bios.date: 05/09/2015 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 3.2 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: X9DAi dmi.board.vendor: Supermicro dmi.board.version: 0123456789 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmeric
[Kernel-packages] [Bug 1839521] Re: Xenial: ZFS deadlock in shrinker path with xattrs
Verification done for linux on xenial-proposed. The inodes for file, xattr dir, and xattr child are all evicted at file removal time, not making it to any disposal list after file removal. So the window/scenario for the problem to occur is not present anymore. Log --- $ uname -rv 4.4.0-160-generic #188-Ubuntu SMP Wed Aug 14 04:21:43 UTC 2019 $ modinfo zfs | head filename: /lib/modules/4.4.0-160-generic/kernel/zfs/zfs/zfs.ko version:0.6.5.6-0ubuntu28 ... srcversion: 99F1D0FED2F291CA7AED0C6 $ sudo apt-get install zfsutils-linux attr $ sudo ./zfs-mount.sh $ echo 2 | sudo tee /proc/sys/vm/drop_caches 2 $ sudo ./zfs-kprobes.sh $ sudo cat /sys/kernel/debug/tracing/trace_pipe & $ touch /zfs/file touch-10656 [001] d... 359.615887: p_zfs_mknode_0: (zfs_mknode+0x0/0xe00 [zfs]) flag=0x0 dzp=0x8800b9875940 touch-10656 [001] d... 359.616184: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xa touch-10656 [001] d... 359.616339: r_zfs_znode_alloc_0: (zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880036f48440 $ setfattr -n user.debug -v 1 /zfs/file setfattr-10657 [000] d... 361.507063: p_zfs_mknode_0: (zfs_mknode+0x0/0xe00 [zfs]) flag=0x2 dzp=0x880036f48440 setfattr-10657 [000] d... 361.507265: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xb setfattr-10657 [000] d... 361.507402: r_zfs_znode_alloc_0: (zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880139d09980 setfattr-10657 [000] d... 361.507665: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa setfattr-10657 [000] d... 361.507792: r_zfs_zget_0: (zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget) setfattr-10657 [000] d... 361.507981: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa setfattr-10657 [000] d... 361.508104: r_zfs_zget_0: (zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget) setfattr-10657 [000] d... 361.508692: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa setfattr-10657 [000] d... 361.508821: r_zfs_zget_0: (zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget) setfattr-10657 [000] d... 361.509022: p_zfs_mknode_0: (zfs_mknode+0x0/0xe00 [zfs]) flag=0x0 dzp=0x880139d09980 setfattr-10657 [000] d... 361.509170: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xc setfattr-10657 [000] d... 361.509302: r_zfs_znode_alloc_0: (zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880139d09100 $ rm /zfs/file rm-10658 [001] d... 363.216716: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa rm-10658 [001] d... 363.216882: r_zfs_zget_0: (zfs_dirent_lock+0x56c/0x6c0 [zfs] <- zfs_zget) rm-10658 [001] d... 363.217130: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xb rm-10658 [001] d... 363.217271: r_zfs_zget_0: (zfs_remove+0x22b/0x4c0 [zfs] <- zfs_zget) rm-10658 [001] d... 363.217567: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0x880036f48650 rm-10658 [001] d... 363.217715: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0x880036f48650 rm-10658 [001] d... 363.217835: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0x880036f48440 obj=0xa rm-10658 [001] d... 363.217963: p_zfs_rmnode_0: (zfs_rmnode+0x0/0x360 [zfs]) znode=0x880036f48440 rm-10658 [001] d... 363.218102: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xb rm-10658 [001] d... 363.218232: r_zfs_zget_0: (zfs_rmnode+0x25b/0x360 [zfs] <- zfs_zget) rm-10658 [001] d... 363.218464: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0x880139d09b90 obj=0x0 <...>-10308 [003] d... 363.218496: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0x880139d09b90 z_iput-10308 [003] d... 363.218503: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0x880139d09b90 z_iput-10308 [003] d... 363.218505: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0x880139d09980 obj=0xb z_iput-10308 [003] d... 363.218509: p_zfs_rmnode_0: (zfs_rmnode+0x0/0x360 [zfs]) znode=0x880139d09980 z_iput-10308 [003] d... 363.218512: p_zfs_purgedir_0: (zfs_purgedir+0x0/0x230 [zfs]) znode=0x880139d09980 z_iput-10308 [003] d... 363.218560: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xc z_iput-10308 [003] d... 363.218566: r_zfs_zget_0: (zfs_purgedir+0xb4/0x230 [zfs] <- zfs_zget) z_iput-10308 [003] d... 363.218606: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0x880139d09310 obj=0x0 z_iput-10308 [003] d... 363.218626: r_zfs_purgedir_0: (zfs_rmnode+
[Kernel-packages] [Bug 1840704] [NEW] ZFS kernel modules lack debug symbols
Public bug reported: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches will be sent soon for linux and zfs/spl-linux, covering X/B/D/E/Unstable. ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: In Progress ** Changed in: linux (Ubuntu) Status: New => In Progress ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches will be sent soon for linux and zfs/spl-linux, covering X/B/D/E/Unstable. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1840789] [NEW] bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
Public bug reported: Description/patches to be provided this week. ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: In Progress ** Changed in: linux (Ubuntu) Status: New => In Progress ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled Status in linux package in Ubuntu: In Progress Bug description: Description/patches to be provided this week. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device
Verification successful with bionic-proposed. No warning nor oops anymore. # uname -rv 4.15.0-59-generic #66-Ubuntu SMP Wed Aug 14 10:56:44 UTC 2019 # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 [ 105.696881] bcache: register_bdev() registered backing device loop0 [ 105.703809] bcache: run_cache_set() invalidating existing data [ 105.714280] bcache: register_cache() registered cache device loop1 [ 109.677765] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 3fd195b5-7334-4759-81d9-0faadc042f59 # # reboot # # comment last line in script. # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 [ 21.645209] bcache: register_bdev() registered backing device loop0 [ 21.697858] bcache: run_cache_set() invalidating existing data [ 21.709142] bcache: register_cache() registered cache device loop1 # sleep 10 # ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837788 Title: bcache kernel warning when attaching device Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Invalid Bug description: [Impact] * Users can get a Warning or even Oops the kernel if bcache/writeback_percent is set before attaching a caching device to the bcache device. * The fix is trivial, upstream, and consists of just checking whether the caching device is attached in order to set flags and schedule thread (which oops). [Test Case] * See attachment 'setup-bcache-wb_percent-before-attach.sh' used in comment #5 and #6 to reproduce the problem(s). * for 'Warning': # make-bcache -B # make-bcache -C # echo 11 > /sys/block//bcache/writeback_percent # sleep 1 # echo > /sys/block//bcache/attach * for 'Oops': (steps above, but don't run last command / 'attach'). [Regression Potential] * Low. The fix is trivial, contained, and exclusive to bcache sysfs handler. * The modified path has been exercised with synthetic testing (script). [Original Bug Description] See attached dmesg, each time this server is rebooted it emits a concerning bcache warning. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-54-generic 4.15.0-54.58 ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18 Uname: Linux 4.15.0-54-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-54-generic. AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Date: Wed Jul 24 12:28:06 2019 InstallationDate: Installed on 2013-10-04 (2119 days ago) InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 (20130925.1) MachineType: Supermicro X9DAi ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb RelatedPackageVersions: linux-restricted-modules-4.15.0-54-generic N/A linux-backports-modules-4.15.0-54-generic N/A linux-firmware 1.173.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago) dmi.bios.date: 05/09/2015 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 3.2 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: X9DAi dmi.board.vendor: Supermicro dmi.board.version: 0123456789 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rv
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
This fix is already present in Eoan and Unstable: ~/git/ubuntu-eoan$ git log --oneline origin/master-next -- drivers/net/ethernet/broadcom/bnx2x/ | head | grep cos 1c41d7b7cf60 bnx2x: Disable multi-cos feature. ~/git/ubuntu-eoan$ git describe --contains 1c41d7b7cf60 Ubuntu-5.2.0-12.13~51 ~/git/ubuntu-unstable$ git log --oneline origin/master -- drivers/net/ethernet/broadcom/bnx2x/ | head | grep cos d1f0b5dce8fd bnx2x: Disable multi-cos feature. ~/git/ubuntu-unstable$ git describe --contains d1f0b5dce8fd Ubuntu-5.3.0-4.5~313^2~91 ** Description changed: - Description/patches to be provided this week. + [Impact] + + * The bnx2x driver may cause hardware faults (leading to +panic/reboot) and other behaviors as transmit timeouts, +after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is +introduced. + + * This issue has been observed by an user shortly +after starting docker & kubelet, with adapters: +- Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] +- Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] + + * If options to ignore hardware faults are used +(erst_disable=1 hest_disable=1 ghes.disable=1) +the system doesn't panic/reboot and continues +on to timeout on adapter stats, then transmit +timeouts, spewing some adapter firmware dumps, +but the network interface is non-functional. + + * The issue only happened when LLDP is enabled +on the network switches, and crashdump shows +the bnx2x driver is stuck/waits for firmware +to complete the stop traffic command in LLDP +handling. Workaround used is to disable LLDP +in the network switches/ports. + + * Analysis of the driver and firmware dumps +didn't help significantly towards finding +the root cause. + + * Upstream/mainline recently just reverted the +patch, due to similar problem reports, while +looking for the root cause/proper fix. + + [Test Case] + + * No reproducible test case found outside +the user's systems/cluster, where it is +enough to start docker & kubelet & wait. + + * The user verified test kernels for Xenial +and Bionic - the problem does not happen. + + [Regression Potential] + + * Users who significantly use/apply the non-default +traffic class (tc) / class of service (cos) might +possibly see performance changes (if any at all) +in such applications, however that's unclear now. + + * This is a recent revert upstream (v5.3-rc'ish), +so there's chance things might change in this area. + + * Nonetheless, the patch is authored by the driver +vendor, and made its way into stable kernels +(e.g., v5.2.8 which made Eoan/19.10 recently). ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: In Progress ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Eoan) Status: In Progress => Fix Released ** Changed in: linux (Ubuntu Disco) Status: New => In Progress ** Changed in: linux (Ubuntu Bionic) Status: New => In Progress ** Changed in: linux (Ubuntu Xenial) Status: New => In Progress ** Changed in: linux (Ubuntu Disco) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Description changed: [Impact] - * The bnx2x driver may cause hardware faults (leading to -panic/reboot) and other behaviors as transmit timeouts, -after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is -introduced. + * The bnx2x driver may cause hardware faults (leading to + panic/reboot) and other behaviors as transmit timeouts, + after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is + introduced. - * This issue has been observed by an user shortly -after starting docker & kubelet, with adapters: -- Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] -- Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] + * This issue has been observed by an user shortly + after starting docker & kubelet, with adapters: + - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] + - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] - * If options to ignore hardware faults are used -(erst_disable=1 hest_disable=1 ghes.disable=1) -the system doesn't panic/reboot and continues -on to timeout on adapter stats, then transmit -timeouts, spewing some adapter f
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
For documentation purposes, in a recent Xenial/4.4 kernel, this kernel error log is seen (with options to ignore the hardware error/fault that panics/reboots the system). [ 113.658876] bnx2x: [bnx2x_stats_comp:205(eno1)]timeout waiting for stats finished [ 123.648066] bnx2x: [bnx2x_state_wait:310(eno1)]timeout waiting for state 6 [ 123.730345] bnx2x: [bnx2x_dcbx_stop_hw_tx:443(eno1)]Unable to hold traffic for HW configuration [ 123.834443] bnx2x: [bnx2x_dcbx_stop_hw_tx:444(eno1)]driver assert [ 123.907439] bnx2x: [bnx2x_panic_dump:919(eno1)]begin crash dump - ... [ 123.907662] bnx2x :19:00.0 eno1: bc 7.14.11 [ 123.907666] begin fw dump (mark 0x3c65c8) [ 123.908033] end of fw dump [ 123.908048] bnx2x: [bnx2x_mc_assert:751(eno1)]Chip Revision: everest3, FW Version: 7_12_30 [ 123.908049] bnx2x: [bnx2x_panic_dump:1182(eno1)]end crash dump - [ 128.701944] bnx2x: [bnx2x_func_state_change:6306(eno1)]timeout waiting for previous ramrod completion [ 128.701946] bnx2x: [bnx2x_dcbx_resume_hw_tx:469(eno1)]Unable to resume traffic after HW configuration [ 128.701946] bnx2x: [bnx2x_dcbx_resume_hw_tx:470(eno1)]driver assert [ 128.701948] bnx2x: [bnx2x_panic_dump:919(eno1)]begin crash dump - ... [ 128.702170] bnx2x :19:00.0 eno1: bc 7.14.11 [ 128.702173] begin fw dump (mark 0x3c65c8) [ 128.702542] end of fw dump [ 128.702557] bnx2x: [bnx2x_mc_assert:751(eno1)]Chip Revision: everest3, FW Version: 7_12_30 [ 128.702558] bnx2x: [bnx2x_panic_dump:1182(eno1)]end crash dump - [ 128.702565] bnx2x: [bnx2x_sp_rtnl_task:10229(eno1)]Indicating link is down due to Tx-timeout [ 130.704628] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[0]: txdata->tx_pkt_prod(4) != txdata->tx_pkt_cons(3) [ 132.706968] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[8]: txdata->tx_pkt_prod(445) != txdata->tx_pkt_cons(443) [ 134.710090] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[16]: txdata->tx_pkt_prod(29) != txdata->tx_pkt_cons(25) ... [ 202.648543] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[7]: txdata->tx_pkt_prod(25) != txdata->tx_pkt_cons(24) [ 204.792441] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[23]: txdata->tx_pkt_prod(51) != txdata->tx_pkt_cons(46) [ 204.940151] bnx2x: [bnx2x_del_all_macs:8499(eno1)]Failed to delete MACs: -5 [ 205.023453] bnx2x: [bnx2x_chip_cleanup:9319(eno1)]Failed to schedule DEL commands for UC MACs list: -5 [ 206.351810] bnx2x: [bnx2x_func_stop:9078(eno1)]FUNC_STOP ramrod failed. Running a dry transaction [ 206.778590] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout! [ 206.856735] bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE returned failure -1 [ 207.134674] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout! [ 207.212785] bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE returned failure -1 [ 207.490725] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout! ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Disco: In Progress Status in linux source package in Eoan: Fix Released Bug description: [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
Somewhat similarly on recent 5.2 kernel without the fix. (again with options to ignore hardware errors/faults) Aug 19 17:15:15 HOSTNAME kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0. Aug 19 17:15:15 HOSTNAME kernel: perf interrupt took too long (3222 > 2500), lowering kernel.perf_event_max_sample_rate to 5 Aug 19 17:15:15 HOSTNAME kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9300. Sending cookies. Check SNMP counters. Aug 19 17:15:15 HOSTNAME kernel: Do you have a strange power saving mode enabled? Aug 19 17:15:15 HOSTNAME kernel: Dazed and confused, but trying to continue ... Aug 19 17:15:21 HOSTNAME kernel: NETDEV WATCHDOG: eno1 (bnx2x): transmit queue 0 timed out ... Aug 19 17:15:21 HOSTNAME kernel: bnx2x: [bnx2x_sp_rtnl_task:10229(eno1)]Indicating link is down due to Tx-timeout Aug 19 17:15:21 HOSTNAME kernel: bond0: link status down for interface eno1, disabling it in 200 ms Aug 19 17:15:21 HOSTNAME kernel: bnx2x: [bnx2x_stats_comp:205(eno1)]timeout waiting for stats finished Aug 19 17:15:21 HOSTNAME kernel: bnx2x: [bnx2x_stats_comp:205(eno1)]timeout waiting for stats finished Aug 19 17:15:23 HOSTNAME kernel: bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[0]: txdata->tx_pkt_prod(4) != txdata->tx_pkt_cons(2) Aug 19 17:15:25 HOSTNAME kernel: bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[8]: txdata->tx_pkt_prod(1) != txdata->tx_pkt_cons(0) ... Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout waiting for state 0 Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_del_all_macs:8499(eno1)]Failed to delete MACs: -16 Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_chip_cleanup:9319(eno1)]Failed to schedule DEL commands for UC MACs list: -16 Aug 19 17:17:24 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout waiting for state 9 Aug 19 17:17:34 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout waiting for state 2 Aug 19 17:17:34 HOSTNAME kernel: bnx2x: [bnx2x_func_stop:9078(eno1)]FUNC_STOP ramrod failed. Running a dry transaction Aug 19 17:17:35 HOSTNAME kernel: bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout! Aug 19 17:17:35 HOSTNAME kernel: bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE returned failure -1 Aug 19 17:17:35 HOSTNAME kernel: bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout! ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Disco: In Progress Status in linux source package in Eoan: Fix Released Bug description: [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to start docker & kubelet & wait. * The user verified test kernels for Xenial and Bionic - the problem does not happen; build-tested on Disco. [Regression Potential] * Users who significantly use/apply the non-default traffic class (tc) / class of service (cos) might possibly see performance changes (if any at all) in such applications, however that's unclear now. * This is a recent revert upstream (v5.3-rc'ish), so there's chance things might change in this area. * Nonetheless, the patch is authored by the driver vendor, and made its way into stable kernels (e.g., v5.2.8 which made Eoan/19.10
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
Older crashdump analysis confirmed the bnx2x driver/status being in traffic class setup / stop hardware in LLDP path. PID: 3936 TASK: 883fdc9b1c00 CPU: 11 COMMAND: "kworker/11:0" #0 [883fec593ce0] __schedule at 81850bae #1 [883fec593d30] schedule at 818510f5 #2 [883fec593d48] schedule_preempt_disabled at 8185139e #3 [883fec593d58] __mutex_lock_slowpath at 81852fd9 #4 [883fec593db0] mutex_lock at 8185306f #5 [883fec593dc8] rtnl_lock at 81756e15 #6 [883fec593dd8] bnx2x_sp_rtnl_task at c025d8c4 [bnx2x] #7 [883fec593e20] process_one_work at 8109e68b #8 [883fec593e60] worker_thread at 8109e9fb #9 [883fec593ec0] kthread at 810a4dc7 #10 [883fec593f50] ret_from_fork at 81855735 Check this stack frame: #6 [883fec593dd8] bnx2x_sp_rtnl_task at c025d8c4 [bnx2x] Which is 9 x 8-byte/64-bit values long: #7 [883fec593e20] 883fec593e20 - 883fec593dd8 = 0x48 bytes = 72 bytes = 9 x 8 bytes. crash> rd 883fec593dd8 9 883fec593dd8: c025d8c4 883feaa0a178 ..%.x...?... 883fec593de8: 6199482b89f76272 883fe9571080 rb..+H.a..W.?... 883fec593df8: 883ffdf56b40 883ffdf5b400 @k..?...?... 883fec593e08: 02c0 881fe93f0dd8 ..?. 883fec593e18: 883fec593e58X>Y.?... The top of the stack has the RIP/next-instruction contents, which matches what's in the stack frame line. c025d8c4 Looking at the disassembly, it's right after the 'callq rtnl_lock', as expected. static void bnx2x_sp_rtnl_task(struct work_struct *work) { rdi = work 0xc025d890 :nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xc025d895 : push %rbp 0xc025d896 : mov%rsp,%rbp 0xc025d899 : push %r15 0xc025d89b : push %r14 0xc025d89d : push %r13 0xc025d89f : push %r12 0xc025d8a1 : lea-0x598(%rdi),%r12 ^ struct bnx2x *bp = container_of(work, struct bnx2x, sp_rtnl_task.work); r12 = bp 0xc025d8a8 : push %rbx 0xc025d8a9 : mov%rdi,%rbx rbx = rdi = work work = rbx = 0x881fe93f0dd8 crash> struct work_struct 881fe93f0dd8 struct work_struct { data = { counter = 704 }, entry = { next = 0x881fe93f0de0, prev = 0x881fe93f0de0 }, func = 0xc025d890 } bp = 0x881fe93f0840 (offset in asm above) crash> eval 0x881fe93f0dd8 - 0x598 hexadecimal: 881fe93f0840 decimal: 18446612269371426880 (-131804338124736) octal: 174201775117604100 binary: 10001000100100111100 crash> struct bnx2x 881fe93f0840 struct bnx2x { fp = 0x881fe95c4000, sp_objs = 0x881fe9fb, fp_stats = 0x881fe935c000, bnx2x_txq = 0x881fe87ef000, regview = 0xc9001d00, doorbells = 0xc90019878000, ... dev = 0x881fe93f, pdev = 0x881fef03b000, iro_arr = 0x881ff0756000, recovery_state = BNX2X_RECOVERY_DONE, ... cnic_support = 1 '\001', cnic_enabled = false, cnic_loaded = false, cnic_probe = 0xc0243280 , fcoe_init = false, ... sp_task = { ... func = 0xc0251960 ... sp_rtnl_task = { work = { data = { counter = 704 }, entry = { next = 0x881fe93f0de0, prev = 0x881fe93f0de0 }, func = 0xc025d890 }, ... fw_ver = "FFV14.04.18 \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", ... dcb_state = 1, dcbx_enabled = 2, dcbx_mode_uset = false, dcbx_config_params = { overwrite_settings = 1, admin_dcbx_version = 0, admin_ets_enable = 1, admin_pfc_enable = 1, admin_tc_supported_tx_enable = 1, admin_ets_configuration_tx_enable = 1, admin_ets_recommendation_tx_enable = 0, admin_pfc_tx_enable = 1, admin_application_priority_tx_enable = 1, admin_ets_willing = 1, admin_ets_reco_valid = 1, admin_pfc_willing = 1, admin_app_priority_willing = 1, admin_configuration_bw_precentage = {100, 0, 0, 0, 0, 0, 0, 0}, admin_configurati
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
[X/B][PATCH] bnx2x: Disable multi-cos feature. https://lists.ubuntu.com/archives/kernel-team/2019-August/103282.html [D][PATCH] bnx2x: Disable multi-cos feature. https://lists.ubuntu.com/archives/kernel-team/2019-August/103283.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Disco: In Progress Status in linux source package in Eoan: Fix Released Bug description: [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to start docker & kubelet & wait. * The user verified test kernels for Xenial and Bionic - the problem does not happen; build-tested on Disco. [Regression Potential] * Users who significantly use/apply the non-default traffic class (tc) / class of service (cos) might possibly see performance changes (if any at all) in such applications, however that's unclear now. * This is a recent revert upstream (v5.3-rc'ish), so there's chance things might change in this area. * Nonetheless, the patch is authored by the driver vendor, and made its way into stable kernels (e.g., v5.2.8 which made Eoan/19.10 recently). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io
Hi Drew, There's a mpt3sas fix in v5.3-rc3 for a problem that may cause an adapter firmware fault (although not sure of the exact fault state code; but it should cause a reset anyway). If you could please test either 1) v5.3-rc2 [1] to confirm the issue happens with v5.3-rc2 but not with v5.3-rc3; or 2) or 4.15.0-60.67 (in bionic-proposed) which has the fix (so checking whether issue doesn't happen) that would be great. If that doesn't help, please continue with the great regression tip provided by Kai-Heng Feng. Thanks! Mauricio [1] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc2/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841132 Title: mpt3sas - storage controller resets under heavy disk io Status in linux package in Ubuntu: Incomplete Bug description: [summary] when a server running ubuntu 18.04 with an lsi sas controller experiences high disk io there is a chance the storage controller will reset this can take weeks or months, but once the controller resets it will keep resetting every few seconds or few minutes, dramatically degrading disk io the server must be rebooted to restore the controller to a normal state [hardware configuration] server: dell poweredge r7415, purchased 2019-02 cpu/chipset: amd epyc naples storage controller: "dell hba330 mini" with chipset "lsi sas3008" drives: 4x samsung 860 pro 2TB ssd [software configuration] ubuntu 18.04 server mdadm raid6 all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q) [what happened] server was operating as a vm host for months without issue one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag reset !!" and "Power-on or device reset occurred", along with unusably-slow disk io the server was removed from production and I looked for a way to reproduce the issue [how to reproduce the issue] there are probably many ways to product this issue, the hackish way I found to reliably reproduce it was: have the four ssds in a mdadm raid6 with ext4 filesystem create three 500GB files containing random data open three terminals. one calculates md5sum of file1 in a loop, another does the same for file2, the third does a copy of file3 to file3-temp in a loop the number of files is arbitrary, the goal is just to generate a lot of disk io on files too large to be cached in memory then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause even more drive thrashing within 1-15min the controller will enter the broken state. the longest I ever saw it take was 30min. I reproduced this several times rebooting the server restores the controller to a normal state if the server is not rebooted and the controller is left in this broken state eventually drives will fall out of the array, and sometimes array/filesystem corruption will occur [why this is being reported here] It's unlikely I am exceeding limits of the hardware since this server chassis can hold 24 drives and I am only using 4. The controller specs indicate I should not hit pcie bandwidth limits until at least 16 drives. My first thought was that the lsi controller firmware was at fault since they have been historically buggy, however I reproduced this with the newest firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be dell-specific). I then tried the most recent motherboard bios "1.9.3", and downgraded to "1.9.2", no change. I then wanted to eliminate the possibility of a bad drive. swapped out all 4 drives with different ones of the same model, no change. I then upgraded from the standard 18.04 kernel to the newer backported hwe kernel, which also came with a newer mpt3sas driver, no change. I then ran the same test on the same array but with rhel 8, to my surprise I could no longer reproduce the issue. - tl;dr version: ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller breaks in 1-10min ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage controller breaks in 1-15min, max 30min rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same array for 19h, no errors [caveats] Server os misconfiguration is possible, however this is a rather basic vm host running kvm and no 3rd-party packages. I can't conclusively prove this isn't a hardware fault since I don't have a second unused identical server to test on right now, however the fact that the problem can be easily reproduced under ubuntu but not under rhel seems noteworthy. There is another bug (LP: #1810781) similar to this, I didn't post there because it's already marked as fixed. There is also a debian bug (Debian #926202) that encountered this on kernel 4.19.0, but I'm unable to tell if it's the same issue. To manage notifications about this bug go
[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io
Mentioned upstream candidate fix is: commit df9a606184bfdb5ae3ca9d226184e9489f5c24f7 Author: Suganath Prabu Date: Tue Jul 30 03:43:57 2019 -0400 scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as per hardware design, if DMA-able range contains all 64-bits set (0x-) then it results in a firmware fault. E.g. SGE's start address is 0x-000 and data length is 0x1000 bytes. when HBA tries to DMA the data at 0x- location then HBA will fault the firmware. Driver will set 63-bit DMA mask to ensure the above address will not be used. Cc: # 5.1.20+ Signed-off-by: Suganath Prabu Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen git/linux $ git describe --contains df9a606184bfdb5ae3ca9d226184e9489f5c24f7 v5.3-rc3~21^2~1 git/ubuntu-bionic $ git log --oneline Ubuntu-4.15.0-60.67 -- drivers/scsi/mpt3sas/ 395f1e3037b8 scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841132 Title: mpt3sas - storage controller resets under heavy disk io Status in linux package in Ubuntu: Incomplete Bug description: [summary] when a server running ubuntu 18.04 with an lsi sas controller experiences high disk io there is a chance the storage controller will reset this can take weeks or months, but once the controller resets it will keep resetting every few seconds or few minutes, dramatically degrading disk io the server must be rebooted to restore the controller to a normal state [hardware configuration] server: dell poweredge r7415, purchased 2019-02 cpu/chipset: amd epyc naples storage controller: "dell hba330 mini" with chipset "lsi sas3008" drives: 4x samsung 860 pro 2TB ssd [software configuration] ubuntu 18.04 server mdadm raid6 all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q) [what happened] server was operating as a vm host for months without issue one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag reset !!" and "Power-on or device reset occurred", along with unusably-slow disk io the server was removed from production and I looked for a way to reproduce the issue [how to reproduce the issue] there are probably many ways to product this issue, the hackish way I found to reliably reproduce it was: have the four ssds in a mdadm raid6 with ext4 filesystem create three 500GB files containing random data open three terminals. one calculates md5sum of file1 in a loop, another does the same for file2, the third does a copy of file3 to file3-temp in a loop the number of files is arbitrary, the goal is just to generate a lot of disk io on files too large to be cached in memory then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause even more drive thrashing within 1-15min the controller will enter the broken state. the longest I ever saw it take was 30min. I reproduced this several times rebooting the server restores the controller to a normal state if the server is not rebooted and the controller is left in this broken state eventually drives will fall out of the array, and sometimes array/filesystem corruption will occur [why this is being reported here] It's unlikely I am exceeding limits of the hardware since this server chassis can hold 24 drives and I am only using 4. The controller specs indicate I should not hit pcie bandwidth limits until at least 16 drives. My first thought was that the lsi controller firmware was at fault since they have been historically buggy, however I reproduced this with the newest firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be dell-specific). I then tried the most recent motherboard bios "1.9.3", and downgraded to "1.9.2", no change. I then wanted to eliminate the possibility of a bad drive. swapped out all 4 drives with different ones of the same model, no change. I then upgraded from the standard 18.04 kernel to the newer backported hwe kernel, which also came with a newer mpt3sas driver, no change. I then ran the same test on the same array but with rhel 8, to my surprise I could no longer reproduce the issue. - tl;dr version: ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller breaks in 1-10min ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage controller breaks in 1-15min, max 30min rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same array for 19h, no errors [caveats] Server os misconfiguration is possible, however this is a rather basic vm host running kvm and no 3rd-party packages. I can't conclusively prove this isn't a hard
[Kernel-packages] [Bug 1841148] Re: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver
Hi Martin, There's a potential fix for this upstream, in v5.3-rc1 mainline build (thus not in v5.2), which is also applied to bionic-proposed (4.15.0-60.67). Could you please test whether the issue is resolved with bionic-proposed [1] ? If that doesn't help, further regression/bisect test steps will be needed. Thank you! [1] https://wiki.ubuntu.com/Testing/EnableProposed The mentioned potential fix is: commit d17ba0f616a08f597d9348c372d89b8c0405ccf3 Author: Konstantin Khlebnikov Date: Wed Apr 17 11:13:20 2019 +0300 e1000e: start network tx queue only when link is up Driver does not want to keep packets in Tx queue when link is lost. But present code only reset NIC to flush them, but does not prevent queuing new packets. Moreover reset sequence itself could generate new packets via netconsole and NIC falls into endless reset loop. This patch wakes Tx queue only when NIC is ready to send packets. This is proper fix for problem addressed by commit 0f9e980bf5ee ("e1000e: fix cyclic resets at link up with active tx"). Signed-off-by: Konstantin Khlebnikov Suggested-by: Alexander Duyck Tested-by: Joseph Yasi Tested-by: Aaron Brown Tested-by: Oleksandr Natalenko Signed-off-by: Jeff Kirsher git/linux $ git describe --contains d17ba0f616a08f597d9348c372d89b8c0405ccf3 v5.3-rc1~140^2~410^2~2 git/ubuntu-bionic $ git log --oneline origin/master-next -- drivers/net/ethernet/intel/e1000e/ 02f5b7ea8c79 e1000e: start network tx queue only when link is up ... git/ubuntu-bionic $ git describe --contains 02f5b7ea8c79 Ubuntu-4.15.0-59.66~576 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841148 Title: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver Status in linux package in Ubuntu: Confirmed Bug description: Since linux-image-4.15.0-58-generic my ethernet connection fails to get a connection. The network connection constantly goes up and down. The issue has been reported by another user: https://bugzilla.kernel.org/show_bug.cgi?id=204591 Snippet from kern.log showing that the connection constantly goes up and down: Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134651] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134830] /dev/vmnet: open called by PID 5847 (vmnet-bridge) Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134836] /dev/vmnet: hub 0 does not exist, allocating memory. Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134858] /dev/vmnet: port on hub 0 successfully opened Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134868] bridge-enp0s31f6: up Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134872] bridge-enp0s31f6: attached Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334794] userif-2: sent link down event. Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334801] userif-2: sent link up event. Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.156471] bridge-enp0s31f6: disabling the bridge on dev down Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158580] bridge-enp0s31f6: down Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158599] bridge-enp0s31f6: detached Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356790] userif-2: sent link down event. Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356795] userif-2: sent link up event. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295365] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295729] /dev/vmnet: open called by PID 5847 (vmnet-bridge) Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295741] /dev/vmnet: hub 0 does not exist, allocating memory. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295785] /dev/vmnet: port on hub 0 successfully opened Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295804] bridge-enp0s31f6: up Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295810] bridge-enp0s31f6: attached Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495615] userif-2: sent link down event. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495620] userif-2: sent link up event. Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316505] bridge-enp0s31f6: disabling the bridge on dev down Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316593] bridge-enp0s31f6: down Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316607] bridge-enp0s31f6: detached Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516761] userif-2: sent link down event. Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516767] userif-2: sent link up event. Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441
[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io
Hi Drew, That's very good news! So it looks like that patch resolves the problem. Could you please test the kernel in bionic-proposed [1] (4.15.0-60-generic) which has that patch to confirm it's also working correctly? Thanks! Mauricio [1] https://wiki.ubuntu.com/Testing/EnableProposed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841132 Title: mpt3sas - storage controller resets under heavy disk io Status in linux package in Ubuntu: Incomplete Bug description: [summary] when a server running ubuntu 18.04 with an lsi sas controller experiences high disk io there is a chance the storage controller will reset this can take weeks or months, but once the controller resets it will keep resetting every few seconds or few minutes, dramatically degrading disk io the server must be rebooted to restore the controller to a normal state [hardware configuration] server: dell poweredge r7415, purchased 2019-02 cpu/chipset: amd epyc naples storage controller: "dell hba330 mini" with chipset "lsi sas3008" drives: 4x samsung 860 pro 2TB ssd [software configuration] ubuntu 18.04 server mdadm raid6 all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q) [what happened] server was operating as a vm host for months without issue one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag reset !!" and "Power-on or device reset occurred", along with unusably-slow disk io the server was removed from production and I looked for a way to reproduce the issue [how to reproduce the issue] there are probably many ways to product this issue, the hackish way I found to reliably reproduce it was: have the four ssds in a mdadm raid6 with ext4 filesystem create three 500GB files containing random data open three terminals. one calculates md5sum of file1 in a loop, another does the same for file2, the third does a copy of file3 to file3-temp in a loop the number of files is arbitrary, the goal is just to generate a lot of disk io on files too large to be cached in memory then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause even more drive thrashing within 1-15min the controller will enter the broken state. the longest I ever saw it take was 30min. I reproduced this several times rebooting the server restores the controller to a normal state if the server is not rebooted and the controller is left in this broken state eventually drives will fall out of the array, and sometimes array/filesystem corruption will occur [why this is being reported here] It's unlikely I am exceeding limits of the hardware since this server chassis can hold 24 drives and I am only using 4. The controller specs indicate I should not hit pcie bandwidth limits until at least 16 drives. My first thought was that the lsi controller firmware was at fault since they have been historically buggy, however I reproduced this with the newest firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be dell-specific). I then tried the most recent motherboard bios "1.9.3", and downgraded to "1.9.2", no change. I then wanted to eliminate the possibility of a bad drive. swapped out all 4 drives with different ones of the same model, no change. I then upgraded from the standard 18.04 kernel to the newer backported hwe kernel, which also came with a newer mpt3sas driver, no change. I then ran the same test on the same array but with rhel 8, to my surprise I could no longer reproduce the issue. - tl;dr version: ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller breaks in 1-10min ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage controller breaks in 1-15min, max 30min rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same array for 19h, no errors [caveats] Server os misconfiguration is possible, however this is a rather basic vm host running kvm and no 3rd-party packages. I can't conclusively prove this isn't a hardware fault since I don't have a second unused identical server to test on right now, however the fact that the problem can be easily reproduced under ubuntu but not under rhel seems noteworthy. There is another bug (LP: #1810781) similar to this, I didn't post there because it's already marked as fixed. There is also a debian bug (Debian #926202) that encountered this on kernel 4.19.0, but I'm unable to tell if it's the same issue. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1841132/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io
Drew, Thanks for testing bionic-proposed! So it will be resolved for bionic kernels shortly, when it hit bionic-updates. Disco/19.04 will get this patch via stable updates in the near future [1]. Eoan has it applied (LP: #1839588). So this is all good. Thanks again, Mauricio [1] https://lists.ubuntu.com/archives/kernel- team/2019-August/103416.html ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: Incomplete ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841132 Title: mpt3sas - storage controller resets under heavy disk io Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: New Status in linux source package in Disco: New Status in linux source package in Eoan: Incomplete Bug description: [summary] when a server running ubuntu 18.04 with an lsi sas controller experiences high disk io there is a chance the storage controller will reset this can take weeks or months, but once the controller resets it will keep resetting every few seconds or few minutes, dramatically degrading disk io the server must be rebooted to restore the controller to a normal state [hardware configuration] server: dell poweredge r7415, purchased 2019-02 cpu/chipset: amd epyc naples storage controller: "dell hba330 mini" with chipset "lsi sas3008" drives: 4x samsung 860 pro 2TB ssd [software configuration] ubuntu 18.04 server mdadm raid6 all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q) [what happened] server was operating as a vm host for months without issue one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag reset !!" and "Power-on or device reset occurred", along with unusably-slow disk io the server was removed from production and I looked for a way to reproduce the issue [how to reproduce the issue] there are probably many ways to product this issue, the hackish way I found to reliably reproduce it was: have the four ssds in a mdadm raid6 with ext4 filesystem create three 500GB files containing random data open three terminals. one calculates md5sum of file1 in a loop, another does the same for file2, the third does a copy of file3 to file3-temp in a loop the number of files is arbitrary, the goal is just to generate a lot of disk io on files too large to be cached in memory then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause even more drive thrashing within 1-15min the controller will enter the broken state. the longest I ever saw it take was 30min. I reproduced this several times rebooting the server restores the controller to a normal state if the server is not rebooted and the controller is left in this broken state eventually drives will fall out of the array, and sometimes array/filesystem corruption will occur [why this is being reported here] It's unlikely I am exceeding limits of the hardware since this server chassis can hold 24 drives and I am only using 4. The controller specs indicate I should not hit pcie bandwidth limits until at least 16 drives. My first thought was that the lsi controller firmware was at fault since they have been historically buggy, however I reproduced this with the newest firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be dell-specific). I then tried the most recent motherboard bios "1.9.3", and downgraded to "1.9.2", no change. I then wanted to eliminate the possibility of a bad drive. swapped out all 4 drives with different ones of the same model, no change. I then upgraded from the standard 18.04 kernel to the newer backported hwe kernel, which also came with a newer mpt3sas driver, no change. I then ran the same test on the same array but with rhel 8, to my surprise I could no longer reproduce the issue. - tl;dr version: ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller breaks in 1-10min ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage controller breaks in 1-15min, max 30min rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same array for 19h, no errors [caveats] Server os misconfiguration is possible, however this is a rather basic vm host running kvm and no 3rd-party packages. I can't conclusively prove this isn't a hardware fault since I don't have a second unused identical server to test on right now, however the fact that the problem can be easily reproduced under ubuntu but not under rhel seems noteworthy. There is another bug (LP: #1810781) similar to this, I didn't post there because it's already marked as fixed. There is also a debi
[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io
** Changed in: linux (Ubuntu Eoan) Status: Incomplete => Fix Released ** Changed in: linux (Ubuntu Disco) Status: New => In Progress ** Changed in: linux (Ubuntu Bionic) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841132 Title: mpt3sas - storage controller resets under heavy disk io Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: In Progress Status in linux source package in Eoan: Fix Released Bug description: [summary] when a server running ubuntu 18.04 with an lsi sas controller experiences high disk io there is a chance the storage controller will reset this can take weeks or months, but once the controller resets it will keep resetting every few seconds or few minutes, dramatically degrading disk io the server must be rebooted to restore the controller to a normal state [hardware configuration] server: dell poweredge r7415, purchased 2019-02 cpu/chipset: amd epyc naples storage controller: "dell hba330 mini" with chipset "lsi sas3008" drives: 4x samsung 860 pro 2TB ssd [software configuration] ubuntu 18.04 server mdadm raid6 all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q) [what happened] server was operating as a vm host for months without issue one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag reset !!" and "Power-on or device reset occurred", along with unusably-slow disk io the server was removed from production and I looked for a way to reproduce the issue [how to reproduce the issue] there are probably many ways to product this issue, the hackish way I found to reliably reproduce it was: have the four ssds in a mdadm raid6 with ext4 filesystem create three 500GB files containing random data open three terminals. one calculates md5sum of file1 in a loop, another does the same for file2, the third does a copy of file3 to file3-temp in a loop the number of files is arbitrary, the goal is just to generate a lot of disk io on files too large to be cached in memory then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause even more drive thrashing within 1-15min the controller will enter the broken state. the longest I ever saw it take was 30min. I reproduced this several times rebooting the server restores the controller to a normal state if the server is not rebooted and the controller is left in this broken state eventually drives will fall out of the array, and sometimes array/filesystem corruption will occur [why this is being reported here] It's unlikely I am exceeding limits of the hardware since this server chassis can hold 24 drives and I am only using 4. The controller specs indicate I should not hit pcie bandwidth limits until at least 16 drives. My first thought was that the lsi controller firmware was at fault since they have been historically buggy, however I reproduced this with the newest firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be dell-specific). I then tried the most recent motherboard bios "1.9.3", and downgraded to "1.9.2", no change. I then wanted to eliminate the possibility of a bad drive. swapped out all 4 drives with different ones of the same model, no change. I then upgraded from the standard 18.04 kernel to the newer backported hwe kernel, which also came with a newer mpt3sas driver, no change. I then ran the same test on the same array but with rhel 8, to my surprise I could no longer reproduce the issue. - tl;dr version: ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller breaks in 1-10min ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage controller breaks in 1-15min, max 30min rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same array for 19h, no errors [caveats] Server os misconfiguration is possible, however this is a rather basic vm host running kvm and no 3rd-party packages. I can't conclusively prove this isn't a hardware fault since I don't have a second unused identical server to test on right now, however the fact that the problem can be easily reproduced under ubuntu but not under rhel seems noteworthy. There is another bug (LP: #1810781) similar to this, I didn't post there because it's already marked as fixed. There is also a debian bug (Debian #926202) that encountered this on kernel 4.19.0, but I'm unable to tell if it's the same issue. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1841132/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://laun
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
** Description changed: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. - Patches will be sent soon for linux and zfs/spl-linux, - covering X/B/D/E/Unstable. + Patches are required in: + 1) linux kernel packaging, to add infrastructure to +enable/build/strip/package debug symbols on DKMS. +(this is sufficient on Eoan's zfs-linux.) + 2) zfs-linux and spl-linux, for the stable releases, +which need a few patches to enable debug symbols. + + Initially submitting the kernel patchset for Unstable, + for review/feedback. It backports nicely into B/D/E, + should it be accepted; for X (doesn't use DKMS builds) + a simpler patch for the moment (until it does) works. + + The zfs/spl-linux patches are ready, to be submitted + once the approach used by the kernel package settles. ** Description changed: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: + 1) linux kernel packaging, to add infrastructure to -enable/build/strip/package debug symbols on DKMS. -(this is sufficient on Eoan's zfs-linux.) + enable/build/strip/package debug symbols on DKMS. + (this is sufficient with zfs-linux now in Eoan.) + 2) zfs-linux and spl-linux, for the stable releases, -which need a few patches to enable debug symbols. + which need a few patches to enable debug symbols. Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols. Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
Test Build 1) Old behavior goal: show limitations/issues. - original packaging - zfs not built with debug symbols - zfs modules not present in debug package - extra modules lack .gnu_debuglink section Original packaging: There are no ZFS modules in the debug package: $ dpkg-deb -x linux-image-unsigned-5.3.0-8-generic- dbgsym_5.3.0-8.9_amd64.ddeb ddeb-orig $ ls ddeb-orig/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs ...: No such file or directory Accordingly, the ZFS modules are the only modules without '.gnu_debuglink' section in the 'linux-modules' package: $ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules $ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko' By the way, this is also the case for *all* modules in the 'linux-modules-extra' package: (only modules in the 'linux-modules' package have '.gnu_debuglink' sections). $ dpkg-deb -x linux-modules- extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules-extras $ find deb-modules-extras/ -name '*.ko' | wc -l 4508 $ find deb-modules-extras/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done | wc -l 4508 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols (add option './configure --enable-debuginfo' and '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
Test Build 4) All debug symbols disabled goal: show no zfs debug symbol activity happens either (along w/ other debug symbol stuff) - test packaging - nothing built with debug symbols - no debug package present - no .gnu_debuglink section at all - (no regressions) Test packaging, debug symbols disabled at all (skipdbg=true). The dkms-build script doesn't do any debug symbol work at all. II: dkms-build installing zfs into /home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs signing zavl.ko signing znvpair.ko signing zunicode.ko signing zcommon.ko signing zfs.ko signing icp.ko signing zlua.ko signing spl.ko II: dkms-build build zfs complete No debug sections are present in ZFS modules (as expected): $ objdump -h deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko | grep debug $ And the check for modules without debug symbols is not exercised (as expected): $ grep WARNING build.log $ $ find deb-modules/ -name '*.ko' | wc -l 1000 $ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done | wc -l 1000 $ find deb-modules-extra/ -name '*.ko' | wc -l 4508 $ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done | wc -l 4508 ** Description changed: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, - which need a few patches to enable debug symbols. + which need a few patches to enable debug symbols +(add option './configure --enable-debuginfo' and +'(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols (add option './configure --enable-debuginfo' and '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
Test Build 2) New behavior if ZFS modules are *not* built with debug symbols goal: show failsafe/backwards compatible behavior if zfs-dksm doesn't support/build debug symbols and kernel build log reports missing debug symbols, and extra modules have .gnu_debuglink. - test packaging - zfs not built with debug symbols (disabled manually in dkms-build if-check) - zfs modules not present in debug package - extra modules have .gnu_debuglink section - (no regressions) Test packaging, with debug symbols *not enabled* in zfs-dkms: The debug symbols are not found (as expected), and this case is handled without problems: II: dkms-build installing zfs into /home/ubuntu/dbgsym/unstable/debian/linux-image-unsigned-5.3.0-8-generic-dbgsym/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs (debug symbols) ignoring zavl.ko (missing debug symbols) stripping zavl.ko ignoring znvpair.ko (missing debug symbols) stripping znvpair.ko ignoring zunicode.ko (missing debug symbols) stripping zunicode.ko ignoring zcommon.ko (missing debug symbols) stripping zcommon.ko ignoring zfs.ko (missing debug symbols) stripping zfs.ko ignoring icp.ko (missing debug symbols) stripping icp.ko ignoring zlua.ko (missing debug symbols) stripping zlua.ko ignoring spl.ko (missing debug symbols) stripping spl.ko II: dkms-build installing zfs into /home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs signing zavl.ko signing znvpair.ko signing zunicode.ko signing zcommon.ko signing zfs.ko signing icp.ko signing zlua.ko signing spl.ko II: dkms-build build zfs complete The debug package contains the ZFS directory, but it's empty: $ dpkg-deb -x linux-image-unsigned-5.3.0-8-generic-dbgsym_5.3.0-8.9_amd64.ddeb ddeb-test-disabled $ ls ddeb-test-disabled/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs/ $ The kernel build log documents which modules do not have debug symbols, now covering modules built with DKMS (zfs and vbox): $ grep WARNING build.log echo "WARNING: Missing debug symbols for module '$module'."; \ WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'. The ZFS modules have no '.gnu_debuglink' section or any other debug section (as expected): $ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules $ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko' Module without debug link 'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko' $ for ko in deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/*.ko; do objdump -h $ko | grep debug; done $ But all modules in 'linux-modules-extra' now have '.gnu_debuglink' sections (except virtualbox modules which are DKMS-built without debug symbols too.) $ dpkg-deb -x linux-modules- extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules-extra $ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h -j .
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
Test Build 3) New behavior if ZFS modules are built with debug symbols goal: show zfs debug symbols are correctly built and packaged into non- debug & debug packages. - test packaging - zfs built with debug symbols - zfs modules present in debug package - extra modules *have* .gnu_debuglink section Test packaging, debug symbols *enabled* in zfs-dkms: Modules are built with debug symbols, copied to debug package directory, and stripped before being copied into strip/non-debug package directory. II: dkms-build installing zfs into /home/ubuntu/dbgsym/unstable/debian/linux-image-unsigned-5.3.0-8-generic-dbgsym/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs (debug symbols) copying zavl.ko stripping zavl.ko copying znvpair.ko stripping znvpair.ko copying zunicode.ko stripping zunicode.ko copying zcommon.ko stripping zcommon.ko copying zfs.ko stripping zfs.ko copying icp.ko stripping icp.ko copying zlua.ko stripping zlua.ko copying spl.ko stripping spl.ko II: dkms-build installing zfs into /home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs signing zavl.ko signing znvpair.ko signing zunicode.ko signing zcommon.ko signing zfs.ko signing icp.ko signing zlua.ko signing spl.ko II: dkms-build build zfs complete The ZFS modules are now present in the debug package: $ dpkg-deb -x linux-image-unsigned-5.3.0-8-generic- dbgsym_5.3.0-8.9_amd64.ddeb ddeb-test-enabled $ ls -1 ddeb-test-enabled/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs/ icp.ko spl.ko zavl.ko zcommon.ko zfs.ko zlua.ko znvpair.ko zunicode.ko And now all modules in 'linux-modules' have the '.gnu_debuglink' section: $ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules $ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done $ The build log no longer shows ZFS modules as missing debug symbols: $ grep WARNING build.log echo "WARNING: Missing debug symbols for module '$module'."; \ WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'. $ The only modules in 'linux-modules-extra' without that continue to be virtualbox modules: $ dpkg-deb -x linux-modules-extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules-extra $ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h -j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; done Module without debug link 'deb-modules-extra/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko' Module without debug link 'deb-modules-extra/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko' $ As reflected in the kernel build log. $ grep WARNING build.log echo "WARNING: Missing debug symbols for module '$module'."; \ WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'. WARNING: Missing debug symbols for module '/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols (add option './configure --enable-debuginfo' and '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
[Unstable][PATCH 0/6] Add support for ZFS debug symbols https://lists.ubuntu.com/archives/kernel-team/2019-August/103425.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols (add option './configure --enable-debuginfo' and '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1841148] Re: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver
Hi Martijn, Thanks for testing bionic-proposed! So it will be resolved for bionic kernels shortly, when it hit bionic-updates. Disco/19.04 will get this patch via stable updates in the near future [1]. Eoan has it applied (LP: #1837725). So this is all good. Thanks again, Mauricio ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Status: Confirmed ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Eoan) Status: Confirmed => Fix Released ** Changed in: linux (Ubuntu Disco) Status: New => In Progress ** Changed in: linux (Ubuntu Bionic) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1841148 Title: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: In Progress Status in linux source package in Eoan: Fix Released Bug description: Since linux-image-4.15.0-58-generic my ethernet connection fails to get a connection. The network connection constantly goes up and down. The issue has been reported by another user: https://bugzilla.kernel.org/show_bug.cgi?id=204591 Snippet from kern.log showing that the connection constantly goes up and down: Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134651] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134830] /dev/vmnet: open called by PID 5847 (vmnet-bridge) Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134836] /dev/vmnet: hub 0 does not exist, allocating memory. Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134858] /dev/vmnet: port on hub 0 successfully opened Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134868] bridge-enp0s31f6: up Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134872] bridge-enp0s31f6: attached Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334794] userif-2: sent link down event. Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334801] userif-2: sent link up event. Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.156471] bridge-enp0s31f6: disabling the bridge on dev down Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158580] bridge-enp0s31f6: down Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158599] bridge-enp0s31f6: detached Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356790] userif-2: sent link down event. Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356795] userif-2: sent link up event. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295365] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295729] /dev/vmnet: open called by PID 5847 (vmnet-bridge) Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295741] /dev/vmnet: hub 0 does not exist, allocating memory. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295785] /dev/vmnet: port on hub 0 successfully opened Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295804] bridge-enp0s31f6: up Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295810] bridge-enp0s31f6: attached Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495615] userif-2: sent link down event. Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495620] userif-2: sent link up event. Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316505] bridge-enp0s31f6: disabling the bridge on dev down Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316593] bridge-enp0s31f6: down Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316607] bridge-enp0s31f6: detached Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516761] userif-2: sent link down event. Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516767] userif-2: sent link up event. Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.438729] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440433] /dev/vmnet: open called by PID 5847 (vmnet-bridge) Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440439] /dev/vmnet: hub 0 does not exist, allocating memory. Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440466] /dev/vmnet: port on hub 0 successfully opened Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440475] bridge-enp0s31f6: up Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440479] bridge-enp0s31f6: attached Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.638884] userif-2: sent link down event. Aug 20 10:06:
[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols
Attaching the debdiffs for zfs-linux/spl-linux on X/B/D/E, for documentation purposes; will send testing/notes later. Independently of the kernel packaging approach determined to enable debug symbols on ZFS/SPL modules, these kind of patches for the userspace packages are be required anyway, and correctly performed that task when building with DKMS. So I'll probably move forward with their SRU request soon, in the benefit of having this available sooner if required (i.e. so users/engineers in need of debug symbols may just rebuild with DKMS using this, and be able to investigate.) ** Attachment added: "lp1840704_debdiffs.tar" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+attachment/5285635/+files/lp1840704_debdiffs.tar -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840704 Title: ZFS kernel modules lack debug symbols Status in linux package in Ubuntu: In Progress Bug description: The ZFS kernel modules aren't built with debug symbols, which introduces problems/issues for debugging/support. Patches are required in: 1) linux kernel packaging, to add infrastructure to enable/build/strip/package debug symbols on DKMS. (this is sufficient with zfs-linux now in Eoan.) 2) zfs-linux and spl-linux, for the stable releases, which need a few patches to enable debug symbols (add option './configure --enable-debuginfo' and '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.) Initially submitting the kernel patchset for Unstable, for review/feedback. It backports nicely into B/D/E, should it be accepted; for X (doesn't use DKMS builds) a simpler patch for the moment (until it does) works. The zfs/spl-linux patches are ready, to be submitted once the approach used by the kernel package settles. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled
Marking status on B/X/D as Incomplete. (email below sent to kernel-team mailing list as replies to both patch series above). Please hold / don't apply this patch for now. The reporter hit an apparently unrelated Oops in 3 of 40 nodes, and it hasn't been possible yet to determine whether this patch is at all related or at fault, due to timing/deployment matters preventing a methodical approach to revert to a original kernel. Since the patch is recent even in the mainline kernel, holding it up for a bit seemed to be the most prudent action for LTSes and thus drop the patch which would be required on Disco too. We'll be following up on this as possible on the reporter's end. ** Changed in: linux (Ubuntu Xenial) Status: In Progress => Incomplete ** Changed in: linux (Ubuntu Bionic) Status: In Progress => Incomplete ** Changed in: linux (Ubuntu Disco) Status: In Progress => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1840789 Title: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Incomplete Status in linux source package in Bionic: Incomplete Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Fix Released Bug description: [Impact] * The bnx2x driver may cause hardware faults (leading to panic/reboot) and other behaviors as transmit timeouts, after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is introduced. * This issue has been observed by an user shortly after starting docker & kubelet, with adapters: - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c] - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79] * If options to ignore hardware faults are used (erst_disable=1 hest_disable=1 ghes.disable=1) the system doesn't panic/reboot and continues on to timeout on adapter stats, then transmit timeouts, spewing some adapter firmware dumps, but the network interface is non-functional. * The issue only happened when LLDP is enabled on the network switches, and crashdump shows the bnx2x driver is stuck/waits for firmware to complete the stop traffic command in LLDP handling. Workaround used is to disable LLDP in the network switches/ports. * Analysis of the driver and firmware dumps didn't help significantly towards finding the root cause. * Upstream/mainline recently just reverted the patch, due to similar problem reports, while looking for the root cause/proper fix. [Test Case] * No reproducible test case found outside the user's systems/cluster, where it is enough to start docker & kubelet & wait. * The user verified test kernels for Xenial and Bionic - the problem does not happen; build-tested on Disco. [Regression Potential] * Users who significantly use/apply the non-default traffic class (tc) / class of service (cos) might possibly see performance changes (if any at all) in such applications, however that's unclear now. * This is a recent revert upstream (v5.3-rc'ish), so there's chance things might change in this area. * Nonetheless, the patch is authored by the driver vendor, and made its way into stable kernels (e.g., v5.2.8 which made Eoan/19.10 recently). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Description changed: [Impact] - * It's not possible to access iBFT (iSCSI Boot Firmware Table) information -(settings for network interface, initiator, and target) in the installer -because the 'iscsi_ibft' module is not present in udeb packages. + * It's not possible to access iBFT (iSCSI Boot Firmware Table) information + (settings for network interface, initiator, and target) in the installer + because the 'iscsi_ibft' module is not present in udeb packages. - * Even if it was, the installer does not handle iBFT information at all, -thus any settings are ignored, and iSCSI-related configuration has to -be done manually or with workarounds. + * Even if it was, the installer does not handle iBFT information at all, + thus any settings are ignored, and iSCSI-related configuration has to + be done manually or with workarounds. - * This impacts user-experience and automatic installation on systems and -deployments which actually do provide the iBFT feature and information, -but cannot use it practically. + * This impacts user-experience and automatic installation on systems and + deployments which actually do provide the iBFT feature and information, + but cannot use it practically. - * With proper iBFT support in the installer (kernel module in udeb package -and automatic iSCSI-related configuration) users will be able to rely on -iBFT to install/deploy Ubuntu on their servers and datacenters. + * With proper iBFT support in the installer (kernel module in udeb package + and automatic iSCSI-related configuration) users will be able to rely on + iBFT to install/deploy Ubuntu on their servers and datacenters. - * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, -and configure network/iSCSI according to iBFT information in disk-detect. + * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, + and configure network/iSCSI according to iBFT information in disk-detect. -This is done in disk-detect so that the iSCSI LUNs are detected as disks -(useful in case of no other disks in the system so the installer doesn't -complain nor wait too long) and that any partman-related preseed options -are not required and may be still available for the user. + This is done in disk-detect so that the iSCSI LUNs are detected as disks + (useful in case of no other disks in the system so the installer doesn't + complain nor wait too long) and that any partman-related preseed options + are not required and may be still available for the user. [Test Case] - * linux package / kernel module in udeb: + * linux package / kernel module in udeb: -$ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko + $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko -Check the module loads in the installer environment. -See comment with example for disco. + Check the module loads in the installer environment. + See comment with example for disco. - * d-i/hw-detect package: -(to be done) + * d-i/hw-detect/partman-iscsi package: + (to be done) [Regression Potential] - * linux package: low, the kernel module is not loaded by default, -and only checks whether iBFT information is present in firmware, -then exposes that in sysfs in read-only mode. + * linux package: low, the kernel module is not loaded by default, + and only checks whether iBFT information is present in firmware, + then exposes that in sysfs in read-only mode. - * d-i/hw-detect: -(to be done) + * d-i/hw-detect/partman-iscsi: + (to be done) [Other Info] - - * This has been verified both by the developer with a simple iSCSI -iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) -and by an user with system/firmware that supports iBFT for iSCSI. + + * This has been verified both by the developer with a simple iSCSI + iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) + and by an user with system/firmware that supports iBFT for iSCSI. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Released Status in linux source package in Disco: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with wor
[Kernel-packages] [Bug 1829563] Re: [4.15] bcache device is accessible even if a backing device is not (writeback mode)
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1829563 Title: [4.15] bcache device is accessible even if a backing device is not (writeback mode) Status in linux package in Ubuntu: Confirmed Bug description: This is a request for a backport of the following upstream patch from 4.18: "bcache: stop bcache device when backing device is offline" https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "bionic_d-i.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265731/+files/bionic_d-i.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Test Procedure with KVM guests + iPXE = - 2 guests: iSCSI target/server and iSCSI initiator/client. - 1 bridge for iSCSI traffic (virbr-iscsi, new), static ip. - 1 bridge for internet access (virbr0, exists), dhcp ip. Host: Configure the iSCSI bridge and QEMU access in the host: $ sudo ip link add dev virbr-iscsi type bridge $ sudo ip link set dev virbr-iscsi up $ echo 'allow virbr-iscsi' | sudo tee -a /etc/qemu/bridge.conf $ sudo chmod +s /usr/lib/qemu/qemu-bridge-helper iSCSI target: This guest serves an iSCSI target with one LUN in iSCSI NIC with IP 10.0.0.1 for IP 10.0.0.2. Install/boot this guest: $ qemu-img create -f qcow2 guest-iscsi-target.qcow2 16g $ qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -smp 2 -m 4096 \ -serial stdio \ -vga virtio \ -display vnc=0.0.0.0:2 \ -netdev bridge,id=bridge-world,br=virbr0 \ -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \ -device virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:11 \ -device virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:22 \ -drive file=guest-iscsi-target.qcow2,if=virtio \ -drive file=$RELEASE-server-amd64.iso,media=cdrom,read-only,if=scsi \ -boot once=d Configure iSCSI NIC: $ cat < link/ether 52:54:00:00:00:22 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 brd 10.0.0.255 scope global ens4 ... Configure iSCSI target/lun: # apt-get install -y tgt # mkdir /var/lib/iscsi # dd if=/dev/zero of=/var/lib/iscsi/disk bs=1 count=0 seek=8G # tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.2019-03.com.example:target1 # tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /var/lib/iscsi/disk # tgtadm --lld iscsi --op bind --mode target --tid 1 -I 10.0.0.2 # tgt-admin --dump >/etc/tgt/conf.d/target1.conf iSCSI initiator: --- This guest first boots iPXE to configure iBFT, and then boots/chainloads to debian-installer. The netboot initrd does not contain all patched udebs, so download and install disk-detect and partman-iscsi from the PPA during the install. $ wget http://boot.ipxe.org/ipxe.lkrn $ wget http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/dists/$RELEASE/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz} $ python3 -m http.server & Serving HTTP on 0.0.0.0 port 8000 ... $ qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -smp 2 -m 4096 \ -serial stdio \ -vga virtio \ -display vnc=0.0.0.0:1 \ -netdev bridge,id=bridge-world,br=virbr0 \ -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \ -device virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \ -device virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \ -kernel ipxe.lkrn Connect to VNC for iPXE shell: $ vncviewer :1 iPXE <...> Press Ctrl-B for iPXE command line. ^B iPXE> Configure iSCSI NIC: iPXE> ifopen net1 iPXE> set net1/ip 10.0.0.2 iPXE> set net1/netmask 255.255.255.0 Configure iBFT: (iSCSI portal 10.0.0.1, LUN 1 on target iqn.<...>:target1) iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-03.com.example:target1 Registered SAN device 0x80 Boot the installer (add option 'disk-detect/ibft/enable=true' for installer to detect iBFT iSCSI disks and option 'partman-iscsi/iscsi_auto=true' to set the system to boot from iBFT): iPXE> ifopen net0 iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz disk-detect/ibft/enable=true partman-iscsi/iscsi_auto=true --- console=ttyS0 iPXE> initrd http://192.168.122.1:8000/initrd.gz iPXE> boot Back to serial console. Proceed with the installer. In 'Users and passwords' dialog, select 'Go back', and 'Execute a shell', and 'Continue'. Check kernel version and iscsi_ibft.ko module is present. ~ # uname -rv 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:58:11 UTC 2019 ~ # depmod -a ~ # modinfo --filename iscsi_ibft /lib/modules/5.0.0-8-generic/kernel/drivers/firmware/iscsi_ibft.ko ~ # wget http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/pool/main/h/hw-detect/disk-detect_1.117ubuntu7.$VERSION_amd64.udeb ~ # wget http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/pool/main/p/partman-iscsi/partman-iscsi_40ubuntu4.$VERSION_all.udeb ~ # udpkg --unpack *.udeb ~ # debconf-get disk-detect/ibft/enable true ~ # debconf-get partman-iscsi/iscsi_auto true (Use this if you need it; e.g., forgot kernel cmdline options) ~ # debconf-set disk-detect/ibft/enable true Start another installer menu with the new debconf templates/question: ~ # debconf -o d-i /usr/bin/main-menu Proceed with the installer. In the 'Partition disks' dialog, the iSCSI LUN should be present: S
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Test procedure for the partman-iscsi changes Based on the previous comment setup. iSCSI initiator: --- ... Note there's no 'iscsi_auto' in the kernel cmdline: iPXE> ifopen net0 iPXE> kernel http://192.168.122.1:8000/vmlinuz initrd=initrd.gz --- console=ttyS0 iPXE> initrd http://192.168.122.1:8000/initrd.gz iPXE> boot Back to serial console. Proceed with the installer. In 'Users and passwords' dialogs, select 'Go back', and 'Execute a shell', and 'Continue'. Bring up the iSCSI devices with iBFT (manually or with patched disk-detect udeb) ~ # modprobe iscsi_ibft ~ # iscsistart -N Setting up software interface ens4 ~ # iscsistart -b iscsistart: Logging into iqn.2019-03.com.example:target1 10.0.0.1:3260,1 iscsistart: version 2.0-874 iscsistart: Connection1:0 to [target: iqn.2019-03.com.example:target1, portal: 10.0.0.1,3260] through [iface: default] is operational now ~ # dmesg | grep -e iBFT -e sd [0.007308] iBFT found at 0x9e520. [ 94.949058] iBFT detected. [ 105.158333] sd 2:0:0:1: Attached scsi generic sg2 type 0 [ 105.158800] sd 2:0:0:1: Power-on or device reset occurred [ 105.161642] sd 2:0:0:1: [sda] 16777216 512-byte logical blocks: (8.59 GB/8.00 GiB) [ 105.161646] sd 2:0:0:1: [sda] 4096-byte physical blocks [ 105.161970] sd 2:0:0:1: [sda] Write Protect is off [ 105.161974] sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08 [ 105.162645] sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 105.174899] sd 2:0:0:1: [sda] Attached SCSI disk See interface 1 (ens3) is default interface in 192.168.122.0/24 range, and interface 2 (ens4) is a iSCSI interface in 10.0.0.0/24 range. ~ # ip addr list ... 2: ens3: ... link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff inet 192.168.122.162/24 brd 192.168.122.255 scope global ens3 ... 3: ens4: ... link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4 ... Return with 'exit' and proceed with the installer until the 'Partition disks' dialog. Test #0) Original partman-iscsi In 'Partition disks' dialogs, select 'Go back', and 'Change debconf priority', and 'Continue', and 'low'. Return to the 'Partition disks' dialog. Select 'Guided partitioning', and 'Guided - use entire disk', and 'SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK, and 'All files in one partition (recommended for new users)', and 'Finish partitioning and write changes to disk', and 'No' in 'No partitions for use as swap space', and 'Yes' in 'Write the changes to disks?', and 'Continue' and 'Continue' for swap file questions. This should partition and format the disk, and return to the menu, as the debconf priority is 'low'. Select 'Execute a shell', and 'Continue'. See that the wrong network interface is used for the HWADDR field in iscsi.initramfs (trailing :01 instead of :02). ~ # cat /target/etc/iscsi/iscsi.initramfs HWADDR="52:54:00:00:00:01" ISCSI_TARGET_NAME="iqn.2019-03.com.example:target1" ISCSI_TARGET_IP="10.0.0.1" ISCSI_TARGET_PORT="3260" ISCSI_TARGET_GROUP="1" Test #1) Patched partman-iscsi, changes for patch 1/2 (use the iSCSI interface for HWADDR and /etc/network/interfaces) Install the patched udeb: ~ # wget http://ppa.launchpad.net/mfo/sf211547v2/ubuntu/pool/main/p/partman-iscsi/partman-iscsi_40ubuntu4.18.04.1_all.udeb ~ # udpkg --unpack partman-iscsi_40ubuntu4.18.04.1_all.udeb Verify the new option is not yet enabled. ~ # debconf-get partman-iscsi/iscsi_auto false Unmount the swap so 'Partition disks' can work again. ~ # swapoff /target/swapfile Return with 'exit' and repeat the 'Partition disks'/'Execute a shell' procedure from Test #0. See that the iSCSI network interface is now used in HWADDR: ~ # cat /target/etc/iscsi/iscsi.initramfs HWADDR="52:54:00:00:00:02" ISCSI_TARGET_NAME="iqn.2019-03.com.example:target1" ISCSI_TARGET_IP="10.0.0.1" ISCSI_TARGET_PORT="3260" ISCSI_TARGET_GROUP="1" Test #2) Patched partman-iscsi, changes for patch 2/2 (use ISCSI_AUTO=true in /etc/iscsi/iscsi.initramfs) Now enable the 'partman-iscsi/iscsi_auto' option, and start a new debconf/menu to detect its value: Install the patched udeb again, so the option re-appears: ~ # debconf-get partman-iscsi/iscsi_auto ~ # ~ # udpkg --unpack partman-iscsi_40ubuntu4.19.04.1_all.udeb ~ # debconf-get partman-iscsi/iscsi_auto false ~ # debconf-set partman-iscsi/iscsi_auto true ~ # debconf-get partman-iscsi/iscsi_auto true ~ # swapoff /target/swapfile ~ # debconf -o d-i /usr/bin/main-menu Repeat the 'Partition disks'/'Execute a shell' procedure from Test #0. See that 'ISCSI_AUTO=true' is not configured in 'iscsi.initramfs'. ~ # cat /target/etc/iscsi/iscsi.initramfs ISCSI_AUTO=true Return with 'exit', then 'Change debconf priority' to 'high' again, and proceed/finish the installation. System reboots. B
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "cosmic_d-i.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265734/+files/cosmic_d-i.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "bionic_hw-detect.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265732/+files/bionic_hw-detect.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Adding patches and test procedure for the debian-installer userspace part. (disk-detect probes iSCSI iBFT disks; partman-iscsi sets ISCSI_AUTO=true for booting.) The d-i patches should be uploaded/built _after_ hw-detect and partman-iscsi are successfully built and published, so that its new versions are picked up. The d-i kernel version change has been tested for the architectures amd64, i386, arm64, and ppc64el, on both regular/lvm partitioning, using VMs/QEMU for all archs, plus baremetal for amd64. The hw-detect/partman-iscsi has been tested with a simple, virtual iSCSI/iBFT setup using iPXE, which will be described shortly. cheers, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Un
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "cosmic_hw-detect.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265735/+files/cosmic_hw-detect.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "disco_d-i.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265737/+files/disco_d-i.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "cosmic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265736/+files/cosmic_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "disco_hw-detect.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265738/+files/disco_hw-detect.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "eoan_d-i.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265740/+files/eoan_d-i.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "eoan_hw-detect.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265741/+files/eoan_hw-detect.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "disco_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265739/+files/disco_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "eoan_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265742/+files/eoan_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "bionic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265733/+files/bionic_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Confirmed Status in hw-detect source package in Eoan: Confirmed Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Confirmed Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
/i386/arm64/ppc64el on QEMU, plus amd64 - on baremetal -- see comment 11. -- hw-detect: low, the changes are enabled by a preseed option. - see comment 12. -- partman-iscsi: low, simple changes, plus one fix that has - been tested in detail, and falls back to - previous behavior if it fails. - see comment 13. + based on kernel released to -updates plus one week + monitoring bug reports -- it should be OK. + Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 + on baremetal -- see comment 11. + - hw-detect: low, the changes are enabled by a preseed option. + see comment 12. + - partman-iscsi: low, simple changes, plus one fix that has + been tested in detail, and falls back to + previous behavior if it fails. + see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. ** Also affects: debian-installer (Ubuntu) Importance: Undecided Status: New ** Also affects: hw-detect (Ubuntu) Importance: Undecided Status: New ** Also affects: partman-iscsi (Ubuntu) Importance: Undecided Status: New ** Also affects: debian-installer (Ubuntu Eoan) Importance: Undecided Status: New ** Also affects: hw-detect (Ubuntu Eoan) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Eoan) Importance: Undecided Assignee: Mauricio Faria de Oliveira (mfo) Status: Fix Released ** Also affects: partman-iscsi (Ubuntu Eoan) Importance: Undecided Status: New ** Changed in: debian-installer (Ubuntu Bionic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: debian-installer (Ubuntu Cosmic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: debian-installer (Ubuntu Disco) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: debian-installer (Ubuntu Eoan) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: partman-iscsi (Ubuntu Bionic) Status: New => Confirmed ** Changed in: partman-iscsi (Ubuntu Bionic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: partman-iscsi (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: partman-iscsi (Ubuntu Cosmic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: partman-iscsi (Ubuntu Disco) Status: New => Confirmed ** Changed in: partman-iscsi (Ubuntu Disco) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: partman-iscsi (Ubuntu Eoan) Status: New => Confirmed ** Changed in: partman-iscsi (Ubuntu Eoan) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: hw-detect (Ubuntu Bionic) Status: New => Confirmed ** Changed in: hw-detect (Ubuntu Bionic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: hw-detect (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: hw-detect (Ubuntu Cosmic) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: hw-detect (Ubuntu Disco) Status: New => Confirmed ** Changed in: hw-detect (Ubuntu Disco) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: hw-detect (Ubuntu Eoan) Status: New => Confirmed ** Changed in: hw-detect (Ubuntu Eoan) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) ** Changed in: debian-installer (Ubuntu Bionic) Status: New => Confirmed ** Changed in: debian-installer (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: debian-installer (Ubuntu Disco) Status: New => Confirmed ** Changed in: debian-installer (Ubuntu Eoan) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Confirmed Status in hw-detect package in Ubuntu: Confirmed Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Confirmed Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Release
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Installer (non-kernel) patches submitted to Debian for feedback. - disk-detect: https://bugs.debian.org/924675 - partman-iscsi: https://bugs.debian.org/924680 ** Bug watch added: Debian Bug tracker #924675 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=924675 ** Bug watch added: Debian Bug tracker #924680 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=924680 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect package: (to be done) [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect: (to be done) [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Hi Marius @lazamarius1, Per the kernel.ubuntu.com schedule, the version for Bionic/linux -> Xenial/linux-hwe should land soon. You can verify the version/timestamps for each package/release at the bottom of these pages (the linux-hwe version comes a bit after the corresponding linux version) https://launchpad.net/ubuntu/+source/linux https://launchpad.net/ubuntu/+source/linux-hwe As far as testing, yes, this issue might take longer to reproduce, but initial testing from another user that happened in order to first submit the fix to Ubuntu showed good results, so it's previously good sign of it invidivudally. The integration of it with other fixes, i.e., testing with it in -proposed, will be done by that other user as well, so collectively w/ your testing that might increase chances of the issue still happening or not. There's also regression testing of the kernel builds, which can spot failures, so that collaborates too. Hope this helps, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev&id=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates t
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
@lazamarius1, Actually linux-hwe for Bionic with this fix has just been uploaded. See in https://launchpad.net/ubuntu/+source/linux-hwe Changelog linux-hwe (4.15.0-47.50~16.04.1) xenial; urgency=medium ... * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021) - srcu: Prohibit call_srcu() use under raw spinlocks - srcu: Lock srcu_data structure in srcu_gp_start() ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev&id=1a05c0cd2fee234a10362cc8f66057557cbb291f srcu: Lock srcu_data structure in srcu_gp_start() The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Please investigate this issue and evaluate the proposed fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
cosmic-proposed verification done; iscsi_ibft.ko is present in udeb and loads correctly. --- $ uname -rv 4.18.0-17-generic #18-Ubuntu SMP Wed Mar 13 14:34:40 UTC 2019 $ apt-get download scsi-modules-4.18.0-17-generic-di $ dpkg-deb -c scsi-modules-4.18.0-17-generic-di_4.18.0-17.18_amd64.udeb | grep ibft -rw-r--r-- root/root 17257 2019-03-13 11:52 ./lib/modules/4.18.0-17-generic/kernel/drivers/firmware/iscsi_ibft.ko $ dpkg-deb -x scsi-modules-4.18.0-17-generic-di_4.18.0-17.18_amd64.udeb udeb $ sudo insmod udeb/lib/modules/4.18.0-17-generic/kernel/drivers/scsi/iscsi_boot_sysfs.ko $ sudo insmod udeb/lib/modules/4.18.0-17-generic/kernel/drivers/firmware/iscsi_ibft.ko $ dmesg | grep -i ibft [ 117.143116] No iBFT detected. ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect package: (to be done) [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect: (to be done) [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
bionic-proposed verification done; iscsi_ibft.ko is present in udeb and loads correctly. --- $ uname -rv 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 $ apt-get download scsi-modules-4.15.0-47-generic-di $ dpkg-deb -c scsi-modules-4.15.0-47-generic-di_4.15.0-47.50_amd64.udeb | grep ibft -rw-r--r-- root/root 17086 2019-03-13 04:37 ./lib/modules/4.15.0-47-generic/kernel/drivers/firmware/iscsi_ibft.ko $ dpkg-deb -x scsi-modules-4.15.0-47-generic-di_4.15.0-47.50_amd64.udeb udeb $ sudo insmod udeb/lib/modules/4.15.0-47-generic/kernel/drivers/scsi/iscsi_boot_sysfs.ko $ sudo insmod udeb/lib/modules/4.15.0-47-generic/kernel/drivers/firmware/iscsi_ibft.ko $ dmesg | grep -i ibft [ 297.999505] No iBFT detected. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect package: (to be done) [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect: (to be done) [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821259] [NEW] Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Public bug reported: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094] 881fff991ba0 883ff2113cb0 8111c600 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096] 881fff997c90 881ff03dd400 883ff2113cc0 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105] [] queued_spin_lock_slowpath+0xb/0xf Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109] [] _raw_spin_lock_irqsave+0x37/0x40 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113] [] cpu_stop_queue_work+0x30/0x80 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484116] [] stop_one_cpu_nowait+0x30/0x40 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484119] [] load_balance+0x71b/0x940 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484122] [] pick_next_task_fair+0x275/0x4b0 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484126] [] __schedule+0x6c6/0x7f0 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484132] [] ? sort_range+0x30/0x30 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484134] [] sched
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Analysis The 1st hard lockup is harder to get the interesting data out of, as apparently the registers with variables related to the cpu number have been clobbered by more recent calls in the spinlock path. Looking at the 2nd hard lockup: addr2line + code shows us that try_to_wake_up() in line 1997 is indeed looping with IRQs disabled in line 1939 (thus a hard lockup): $ addr2line -pifae ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic 0x810aacb6 0x810aacb6: try_to_wake_up at /build/linux-lts-xenial-ozsla7/linux-lts-xenial-4.4.0/kernel/sched/core.c:1997 1926 static int 1927 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) 1928 { ... 1939 raw_spin_lock_irqsave(&p->pi_lock, flags); ... 1993 /* 1994 * If the owning (remote) cpu is still in the middle of schedule() with 1995 * this task as prev, wait until its done referencing the task. 1996 */ 1997 while (p->on_cpu) 1998 cpu_relax(); ... 2027 raw_spin_unlock_irqrestore(&p->pi_lock, flags); 2028 2029 return success; 2030 } The objdump disassembly of try_to_wake_up() in vmlinux for the RIP instruction address (810aacb6), shows a while loop that just checks for non-zero 'p->on_cpu' and calls cpu_relax() (which translates to the 'pause' instruction): 810aacb1: f3 90 pause 810aacb3: 8b 43 28mov0x28(%rbx),%eax 810aacb6: 85 c0 test %eax,%eax 810aacb8: 75 f7 jne810aacb1 So, it checks for the value in pointer in RBX + offset 0x28, which according to the 'pahole' tool, is indeed the 'on_cpu' field: $ pahole --hex -C task_struct ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic | grep on_cpu inton_cpu; /* 0x28 0x4 */ So, the task_struct pointer is in RBX, which is: RBX: 883ff2a76200 And that matches the other hard locked up task on CPU 10 (see its 'task:' field). Per the stack trace in CPU 10, and the identical timestamp of the two hard lockup messages, and the fact both stack traces are cpu_stopper related, it does look like CPU 10 is waiting on the spinlock of one of the 2 cpu stoppers held by CPU 6, which is exactly the scenario in the suggested patch. The problem/fix has been verified with a synthetic test-case (attached). commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f Author: Peter Zijlstra Date: Fri Apr 20 11:50:05 2018 +0200 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock Matt reported the following deadlock: CPU0CPU1 schedule(.prev=migrate/0) pick_next_task()... idle_balance() migrate_swap() active_balance()stop_two_cpus() spin_lock(stopper0->lock) spin_lock(stopper1->lock) ttwu(migrate/0) smp_cond_load_acquire() -- waits for schedule() stop_one_cpu(1) spin_lock(stopper1->lock) -- waits for stopper lock Fix this deadlock by taking the wakeups out from under stopper->lock. This allows the active_balance() to queue the stop work and finish the context switch, which in turn allows the wakeup from migrate_swap() to observe the context and complete the wakeup. <...> The stop_two_cpus() call can only happen in a NUMA system per it's caller chain: stop_two_cpus() <- migrate_swap() <- task_numa_migrate() <- numa_migrate_preferred() <- [task_numa_placement()] <- task_numa_fault() -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094] 881fff991ba0 883ff2113cb0 8111c600 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096] 881fff997c90 881ff03dd400 883ff2113cc0 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105] [] queued_spin_lock_slowpath+0xb/0xf Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109] [] _raw_spin_lock_irqsave+0x37/0x40 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113] [] cpu_stop_queue_work+0x30/0x80 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484116] [] stop_one_cpu_nowait+0x30/0x40 Nov 23 15:48:33
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Test-case (kmod-stopper.c) - $ sudo apt-get -y install gcc make libelf-dev linux-headers-$(uname -r) $ touch Makefile # fake it, and use this make line: $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kmod-stopper.o modules $ echo 9 | sudo tee /proc/sys/kernel/printk $ sudo insmod kmod-stopper.ko $ sudo rmmod kmod-stopper ** Attachment added: "kmod-stopper.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821259/+attachment/5248313/+files/kmod-stopper.c -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094] 881fff991ba0 883ff2113cb0 8111c600 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096] 881fff997c90 881ff03dd400 00
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Test-case on Xenial; $ ls -1d /sys/devices/system/cpu/cpu[0-9]* /sys/devices/system/cpu/cpu0 /sys/devices/system/cpu/cpu1 Original $ uname -rv 4.4.0-144-generic #170-Ubuntu SMP Thu Mar 14 11:56:20 UTC 2019 $ sudo insmod kmod-stopper/kmod-stopper.ko [ 74.198379] mod_init() :: this cpu = 0x1, that cpu = 0x0 [ 74.199613] mod_init() :: that_cpu_stopper_task = 88003d80e600, comm = migration/0 [ 74.206194] kp2/stop_two_cpus() :: this cpu = 0x1, that cpu = 0x0 [ 74.206196] do_nothing() :: this cpu = 0x0, that cpu = 0x1 [ 74.206201] kp1/pick_next_task_fair() :: this cpu = 0x0, that cpu = 0x1 [ 74.206203] kp1/pick_next_task_fair() :: before sleep (1000 msecs) [ 74.212759] kp2/stop_two_cpus() :: before sleep (500 msecs) [ 74.710138] kp2/stop_two_cpus() :: after sleep (500 msecs) [ 75.198324] kp1/pick_next_task_fair() :: after sleep (1000 msecs) [ 75.199814] kp1/pick_next_task_fair() :: stopping other cpu... The test-case only failed 2 out of 50+ tests. Patched: --- $ uname -rv 4.4.0-144-generic #170+test20190320b1 SMP Wed Mar 20 18:35:06 UTC 2019 $ sudo insmod kmod-stopper/kmod-stopper.ko [ 85.958527] mod_init() :: this cpu = 0x1, that cpu = 0x0 [ 85.965876] mod_init() :: that_cpu_stopper_task = 88003d80e600, comm = migration/0 [ 85.993446] kp2/stop_two_cpus() :: this cpu = 0x1, that cpu = 0x0 [ 85.993471] do_nothing() :: this cpu = 0x0, that cpu = 0x1 [ 85.993477] kp1/pick_next_task_fair() :: this cpu = 0x0, that cpu = 0x1 [ 85.993480] kp1/pick_next_task_fair() :: before sleep (1000 msecs) [ 86.019469] kp2/stop_two_cpus() :: before sleep (500 msecs) [ 86.521688] kp2/stop_two_cpus() :: after sleep (500 msecs) [ 86.987662] kp1/pick_next_task_fair() :: after sleep (1000 msecs) [ 86.989427] kp1/pick_next_task_fair() :: stopping other cpu... [ 86.991109] do_nothing() :: this cpu = 0x1, that cpu = 0x0 [ 86.992615] do_nothing() :: this cpu = 0x1, that cpu = 0x0 It passes every time (50+ tests). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kerne
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Both xenial and bionic original/patched kernels were tested with stress-ng scheduler class, and no regressions were observed. $ stress-ng --version stress-ng, version 0.09.56 (gcc 8.3, x86_64 Linux 4.15.0-47-generic) 💻🔥 $ sudo stress-ng --class scheduler --sequential 0 $ uname -rv 4.4.0-144-generic #170-Ubuntu SMP Thu Mar 14 11:56:20 UTC 2019 $ uname -rv 4.4.0-144-generic #170+test20190320b1 SMP Wed Mar 20 18:35:06 UTC 2019 $ uname -rv 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 $ uname -rv 4.15.0-47-generic #50+test20190320b1 SMP Wed Mar 20 20:08:03 UTC 2019 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094] 881fff991ba0 883ff2113cb0 8111c600 881fff997300 Nov 23
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
Since Bionic already has the fix commit applied, the original kernel version doesn't hit the problem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux package in Ubuntu: Incomplete Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094] 881fff991ba0 883ff2113cb0 8111c600 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096] 881fff997c90 881ff03dd400 883ff2113cc0 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105] [] queued_spin_lock_slowpath+0xb/0xf Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109] [] _raw_spin_lock_irqsave+0x37/0x40 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113] [] cpu_stop_queue_work+0x30/0x80 Nov 23
[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers
[X][PATCH 0/4] LP#1821259 Fix for deadlock in cpu_stopper https://lists.ubuntu.com/archives/kernel-team/2019-March/099427.html [B][PATCH 0/2] Fix for LP#1821259 (pending patches for) Fix for deadlock in cpu_stopper https://lists.ubuntu.com/archives/kernel-team/2019-March/099432.html ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** No longer affects: linux (Ubuntu) ** Changed in: linux (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux (Ubuntu Xenial) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821259 Title: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Bug description: [Impact] * This problem hard locks up 2 CPUs in a deadlock, and this soft locks up other CPUs as an effect; the system becomes unusable. * This is relatively rare / difficult to hit because it's a corner case in scheduling/load balancing that needs timing with CPU stopper code. And it needs SMP plus _NUMA_ system. (but it can be hit with synthetic test case attached in LP.) * Since SMP plus NUMA usually equals _servers_ it looks like a good idea to prevent this bug / hard lockups / rebooting. * The fix resolves the potential deadlock by removing one of the calls required to deadlock from under the locked code. [Test Case] * There's a synthetic test case to reproduce this problem (although without the stack traces - just a system hang) attached to this LP bug. * It uses kprobes/mdelay/cpu stopper calls to force the code to execute and force the timing/locking condition to occur. * $ sudo insmod kmod-stopper.ko Some dmesg logging occurs, and systems either hangs or not. See examples in comments. [Regression Potential] * These are patches to the cpu stop_machine.c code, and they change a bit how it works; however, there are no upstream fixes for these patches anymore and they are still the top of the 'git log --oneline -- kernel/stop_machine.c' output. * These patches have been verified with the synthetic test case and 'stress-ng --class scheduler --sequential 0' (no regressions) on guest with 2 CPUs and one physical system with 24 CPUs. [Other Info] * The patches are required on Xenial and later. * There are 4 patches for Xenial, and 2 patches pending for Bionic. * All patches are applied from Cosmic onwards. [Original Description] These 2 hard lockups happened all of a sudden in the logs, and many soft lockups occur after them as a fallout. Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: <...> Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 ti: 883ff211 task.ti: 883ff211 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x160/0x170 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: :883ff2113c58 EFLAGS: 0002 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 RBX: 0086 RCX: 0001 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 RSI: 0001 RDI: 881fff991ba8 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 R08: 0101 R09: 883ff082e200 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 R11: 2e04 R12: 881fff997c60 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 R14: R15: 881fff997300 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS: () GS:883fff00() knlGS: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS: 0010 DS: ES: CR0: 80050033 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 CR3: 001f4674 CR4: 00160670 Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack: Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092] 883ff2113c68 811870eb 883ff2113c80 81819907 Nov 23 15:48:33 SYSTEM_NAME kernel
[Kernel-packages] [Bug 1821395] [NEW] fscache: jobs might hang when fscache disk is full
Public bug reported: < NOTE: patches will be sent to kernel-team mailing list. > [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: New Bug description: < NOTE: patches will be sent to kernel-team mailing list. > [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: Incomplete => Invalid ** Changed in: linux (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux (Ubuntu Cosmic) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Bug description: < NOTE: patches will be sent to kernel-team mailing list. > [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] * Low; contained in fscache; no further fixes applied upstream. * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dropping object state machine and deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full
[B/C][PATCH 0/1] Fix for LP#1821395 (fscache: jobs might hang when fscache disk is full) https://lists.ubuntu.com/archives/kernel-team/2019-March/099448.html ** Description changed: - < NOTE: patches will be sent to kernel-team mailing list. > - [Impact] - * fscache issue where jobs get hung when fscache disk is full. + * fscache issue where jobs get hung when fscache disk is full. - * trivial upstream fix; already applied in X/D, required in B/C: -commit c5a94f434c82 ("fscache: fix race between enablement and -dropping of object"). + * trivial upstream fix; already applied in X/D, required in B/C: + commit c5a94f434c82 ("fscache: fix race between enablement and + dropping of object"). [Test Case] - * Test kernel verified / regression-tested by reporter. + * Test kernel verified / regression-tested by reporter. - * Apparently there's no simple test case, -but these are the conditions to hit the problem: + * Apparently there's no simple test case, + but these are the conditions to hit the problem: -1) The active dataset size is equal to the cache disk size. - The application reads the data over and over again. -2) Disk is near full (90%+) -3) cachefilesd in userspace is trying to cull the old objects - while new objects are being looked up. -4) new cachefiles are created and some fail with no disk space. -5) race in dropping object state machine and - deferred lookup state machine causes the hang. -6) HUNG in fscache_wait_for_deferred_lookup for - clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. + 1) The active dataset size is equal to the cache disk size. + The application reads the data over and over again. + 2) Disk is near full (90%+) + 3) cachefilesd in userspace is trying to cull the old objects + while new objects are being looked up. + 4) new cachefiles are created and some fail with no disk space. + 5) race in dropping object state machine and + deferred lookup state machine causes the hang. + 6) HUNG in fscache_wait_for_deferred_lookup for + clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. [Regression Potential] - * Low; contained in fscache; no further fixes applied upstream. + * Low; contained in fscache; no further fixes applied upstream. - * This patch is applied in a stable tree (linux-4.4.y). + * This patch is applied in a stable tree (linux-4.4.y). [Original Description] An user reported an fscache issue where jobs get hung when the fscache disk is full. After investigation, it's been found to be an issue already reported/fixed upstream, by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). This patch is required in Bionic and Cosmic, and it's applied in Xenial (via stable) and Disco. Apparently there's no simple test case, but these are the conditions to hit the problem: - 1) The active dataset size is equal to the cache disk size. -The application reads the data over and over again. + 1) The active dataset size is equal to the cache disk size. + The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects -while new objects are being looked up. + while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. - 5) race in dropping object state machine and -deferred lookup state machine causes the hang. + 5) race in dropping object state machine and + deferred lookup state machine causes the hang. 6) HUNG in fscache_wait_for_deferred_lookup for -clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. + clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1821395 Title: fscache: jobs might hang when fscache disk is full Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Bug description: [Impact] * fscache issue where jobs get hung when fscache disk is full. * trivial upstream fix; already applied in X/D, required in B/C: commit c5a94f434c82 ("fscache: fix race between enablement and dropping of object"). [Test Case] * Test kernel verified / regression-tested by reporter. * Apparently there's no simple test case, but these are the conditions to hit the problem: 1) The active dataset size is equal to the cache disk size. The application reads the data over and over again. 2) Disk is near full (90%+) 3) cachefilesd in userspace is trying to cull the old objects while new objects are being looked up. 4) new cachefiles are created and some fail with no disk space. 5) race in dr
[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()
Updating bug tags to verification done. As mentioned by users in this LP bug, the verification period of 5 days is _usually_ not enough to reproduce this problem, however, we have some datapoints that support the fix is good. 1) The fix has been first delivery in linux-azure, 3 weeks ago, and has reportedly resolved the issue for @alanjcastonguay: the issue was experienced within 4 days at the most, and hasn't happened for 2 weeks in 8 nodes (which is statistically very positive; and it helps that the fix is not specific to -azure). 2) One of the users who reported this in linux (-generic), has verified a test kernel with this fix for weeks, based upon which the fix has been submitted after linux-azure had it. The same user has verified -proposed for about a week now, and it's looking good. 3) Users in this LP bug have been running the -proposed kernel in multiple nodes for about a week now too, and haven't hit the issue yet. On top of 1), with 2) and 3) combined, and the schedule for -proposed verification, this seems to be a reasonable compromise between results and test time. cheers, Mauricio ** Tags removed: verification-needed-bionic verification-needed-cosmic ** Tags added: verification-done-bionic verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1802021 Title: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: Fix Released Status in linux source package in Xenial: New Status in linux-azure source package in Xenial: Fix Released Status in linux source package in Bionic: Fix Committed Status in linux-azure source package in Bionic: Fix Released Status in linux source package in Cosmic: Fix Committed Status in linux-azure source package in Cosmic: Fix Released Bug description: We had a customer seeing traces like the following: tack trace from kern.log: 2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. 2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 4.15.0-1023-azure #24~16.04.1-Ubuntu 2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 0x8000 2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound fsnotify_mark_destroy_workfn 2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace: 2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0 2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? check_preempt_wakeup+0xfb/0x240 2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? sched_clock_local+0x17/0x90 2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80 2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: schedule_timeout+0x1db/0x370 2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? __enqueue_entity+0x5c/0x60 2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? enqueue_entity+0x112/0x670 2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: wait_for_completion+0xb4/0x140 2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70 2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: __synchronize_srcu.part.13+0x85/0xb0 2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? trace_raw_output_rcu_utilization+0x50/0x50 2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? synchronize_srcu+0xd3/0xe0 2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: fsnotify_mark_destroy_workfn+0x7c/0xe0 2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: process_one_work+0x14d/0x410 2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460 2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140 2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? process_one_work+0x410/0x410 2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? kthread_destroy_worker+0x50/0x50 2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130 2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20 2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40 Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120 seconds. We are seeing more issue with fsnotify related callbacks. These are not a soft/hard lockup but seem to significantly degrade the responsiveness of systemd (and from there everything else). The following upstream commit may fix this issue, but it is in Paul's RCU tree and not in linux-next or upstream yet: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux- rcu.git/commit/?h=dev&id=1a05c0cd2fee2
[Kernel-packages] [Bug 1817628] Re: Regular D-state processes impacting LXD containers
Marking X/B as verification done. The user reports the issue occurs much less often now. Apparently that environment hits some corner case or this may still be expected sometimes under memory pressure (i.e., one big shrinking operation acquired the lock and must finish). Nonetheless, the fix reduced the frequency the problem occurs, so it does address most of that and is beneficial on its own. As discussed with @klebers on IRC, in this scenario we agreed to ship the fix, and address the pending behavior (if an actual issue) with additional fixes later. cheers, Mauricio ** Tags removed: verification-needed-bionic verification-needed-xenial ** Tags added: verification-done-bionic verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817628 Title: Regular D-state processes impacting LXD containers Status in linux package in Ubuntu: Invalid Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Bug description: [Impact] * Systems running under memory pressure may hit stalls in the order of seconds to minutes in systemd-logind and lxd mount operations (e.g., ZFS backend), which get stuck in D state. * The processes stuck in D state have a common stack trace, (cat /proc/PID/stack) all blocked in register_shrinker(). * The fix checks in shrink_slab() (shrinkers are called under memory pressure) for contention/usage of the semaphore used by register_shrinker() and returns early in that case. This allows the register_shrinker() callers to unblock, and not stall until the shrink operation releases that lock. [Test Case] * In a system under memory pressure, specifically having the memory shrinkers being called often and taking time to run, perform mount operations (or other operations that acquire the shrinker_rwsem semaphore). * The user who reported the problem has verified the fix in systems that exhibted the problem often (sometimes daily), and tells it resolves the problem. [Regression Potential] * Low. The fix just returns early from slab memory shrinker if there's usage/contention for 'shrinker_rwsem'. * In some scenarios, this may cause the slab memory shrinker to require more invocations to actually finish and potentially release memory, but this seems minor since other shrinkers can release memory as well, and compared to the fact that this fix allows other applications to make progress / continue to run, which would otherwise be stalled. [Other Info] * This patch is already applied in Cosmic and later (v4.16+). It is needed only in Xenial and Bionic at this time. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817628/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "cosmic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267256/+files/cosmic_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Committed Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Fix Committed Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch removed: "bionic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265733/+files/bionic_partman-iscsi.debdiff ** Patch removed: "cosmic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265736/+files/cosmic_partman-iscsi.debdiff ** Patch removed: "disco_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265739/+files/disco_partman-iscsi.debdiff ** Patch removed: "eoan_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265742/+files/eoan_partman-iscsi.debdiff ** Patch added: "bionic_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267255/+files/bionic_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Committed Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Fix Committed Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this b
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "eoan_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267258/+files/eoan_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Committed Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Fix Committed Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Hi Eric, Nice catch. Sorry about that! Sure, just attached the partman-iscsi debdiffs with the LP bug mentioned in changelog. Thanks, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Committed Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Fix Committed Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
** Patch added: "disco_partman-iscsi.debdiff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267257/+files/disco_partman-iscsi.debdiff -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Committed Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Confirmed Status in hw-detect source package in Bionic: Confirmed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Confirmed Status in debian-installer source package in Cosmic: Confirmed Status in hw-detect source package in Cosmic: Confirmed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Confirmed Status in debian-installer source package in Disco: Confirmed Status in hw-detect source package in Disco: Confirmed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Confirmed Status in debian-installer source package in Eoan: Fix Committed Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Dan, thanks for the review/fixes/upload! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: In Progress Status in hw-detect source package in Bionic: In Progress Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: In Progress Status in debian-installer source package in Cosmic: In Progress Status in hw-detect source package in Cosmic: In Progress Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: In Progress Status in debian-installer source package in Disco: In Progress Status in hw-detect source package in Disco: In Progress Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: In Progress Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an user with system/firmware that supports iBFT for iSCSI. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for disco/hw-detect. disk-detect found the iscsi target/lun configured in ibft. $ wget http://boot.ipxe.org/ipxe.lkrn $ wget http://archive.ubuntu.com/ubuntu/dists/disco/main/installer-amd64/20101020ubuntu570/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz} $ python3 -m http.server & $ qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -smp 2 -m 4096 \ -serial stdio \ -vga virtio \ -display vnc=0.0.0.0:1 \ -netdev bridge,id=bridge-world,br=virbr0 \ -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \ -device virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \ -device virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \ -kernel ipxe.lkrn workstation $ vncviewer buneary.segmaas.1ss:1 iPXE> iPXE> ifopen net1 iPXE> set net1/ip 10.0.0.2 iPXE> set net1/netmask 255.255.255.0 iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1 Registered SAN device 0x80 iPXE> ifopen net0 iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0 iPXE> initrd http://192.168.122.1:8000/initrd.gz iPXE> boot ... │ Select disk to partition: │ │ │ │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│ ... ~ # grep 'retrieving disk-detect' /var/log/syslog Jun 5 23:36:13 anna[1582]: DEBUG: retrieving disk-detect 1.117ubuntu6.19.04.1 ~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog Jun 5 23:38:55 disk-detect: insmod /lib/modules/5.0.0-13-generic/kernel/drivers/firmware/iscsi_ibft.ko Jun 5 23:38:55 disk-detect: # BEGIN RECORD 2.0-874 Jun 5 23:38:55 disk-detect: iface.initiatorname = iqn.2010-04.org.ipxe:---- Jun 5 23:38:55 disk-detect: iface.hwaddress = 52:54:00:00:00:02 Jun 5 23:38:55 disk-detect: iface.bootproto = STATIC Jun 5 23:38:55 disk-detect: iface.ipaddress = 10.0.0.2 Jun 5 23:38:55 disk-detect: iface.subnet_mask = 255.255.255.0 Jun 5 23:38:55 disk-detect: iface.primary_dns = 192.168.122.1 Jun 5 23:38:55 disk-detect: iface.vlan_id = 0 Jun 5 23:38:55 disk-detect: iface.net_ifacename = ens4 Jun 5 23:38:55 disk-detect: node.name = iqn.2019-06.com.example:target1 Jun 5 23:38:55 disk-detect: node.conn[0].address = 10.0.0.1 Jun 5 23:38:55 disk-detect: node.conn[0].port = 3260 Jun 5 23:38:55 disk-detect: node.boot_lun = 0100 Jun 5 23:38:55 disk-detect: # END RECORD Jun 5 23:38:55 kernel: [ 207.271009] iBFT detected. Jun 5 23:38:55 disk-detect: Setting up software interface ens4 Jun 5 23:38:55 disk-detect: iscsistart: can not connect to iSCSI daemon (111)! Jun 5 23:38:55 kernel: [ 207.286051] Loading iSCSI transport class v2.0-870. Jun 5 23:38:55 disk-detect: iscsistart: version 2.0-874 Jun 5 23:38:55 disk-detect: Jun 5 23:38:56 kernel: [ 208.291508] iscsi: registered transport (tcp) Jun 5 23:38:56 kernel: [ 208.294107] scsi host2: iSCSI Initiator over TCP/IP Jun 5 23:38:56 disk-detect: iscsistart: Connection1:0 to [target: iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: d efault] is operational now Jun 5 23:38:56 kernel: [ 208.302594] scsi 2:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jun 5 23:38:56 kernel: [ 208.306723] scsi 2:0:0:0: Attached scsi generic sg0 type 12 Jun 5 23:38:56 kernel: [ 208.310611] scsi 2:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jun 5 23:38:56 kernel: [ 208.314013] sd 2:0:0:1: Power-on or device reset occurred Jun 5 23:38:56 kernel: [ 208.315260] sd 2:0:0:1: Attached scsi generic sg1 type 0 Jun 5 23:38:56 disk-detect: iscsistart: Logging into iqn.2019-06.com.example:target1 10.0.0.1:3260,1 Jun 5 23:38:56 kernel: [ 208.317323] sd 2:0:0:1: [sda] 16777216 512-byte logical blocks: (8.59 GB/8.00 GiB) Jun 5 23:38:56 kernel: [ 208.317327] sd 2:0:0:1: [sda] 4096-byte physical blocks Jun 5 23:38:56 kernel: [ 208.317962] sd 2:0:0:1: [sda] Write Protect is off Jun 5 23:38:56 kernel: [ 208.317965] sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08 aJun 5 23:38:56 kernel: [ 208.319113] sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA Jun 5 23:38:56 kernel: [ 208.334152] sd 2:0:0:1: [sda] Attached SCSI disk Jun 5 23:38:56 disk-detect: iBFT disk detection finished. ** Tags removed: verification-needed-disco ** Tags added: verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux pack
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for cosmic/hw-detect. disk-detect found the iscsi target/lun configured in ibft. cosmic currently needed the workaround to download/install the iscsi_ibft.ko module, because the d-i changes are not yet in (update to kernel version to includes it in scsi-modules.udeb). $ wget http://boot.ipxe.org/ipxe.lkrn $ wget http://archive.ubuntu.com/ubuntu/dists/cosmic-updates/main/installer-amd64/20101020ubuntu557.1/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz} $ python3 -m http.server & $ qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -smp 2 -m 4096 \ -serial stdio \ -vga virtio \ -display vnc=0.0.0.0:1 \ -netdev bridge,id=bridge-world,br=virbr0 \ -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \ -device virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \ -device virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \ -kernel ipxe.lkrn workstation $ vncviewer buneary.segmaas.1ss:1 iPXE> iPXE> ifopen net1 iPXE> set net1/ip 10.0.0.2 iPXE> set net1/netmask 255.255.255.0 iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1 Registered SAN device 0x80 iPXE> ifopen net0 iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0 iPXE> initrd http://192.168.122.1:8000/initrd.gz iPXE> boot ... ~ # ls /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko ls: /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko: No such file or directory ~ # cd /tmp /tmp # wget http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-modules-4.18.0-10-generic_4.18.0-10.11_amd64.deb /tmp # ar x linux-modules-4.18.0-10-generic_4.18.0-10.11_amd64.deb /tmp # xzcat data.tar.xz | tar x /tmp # mkdir /lib/modules/4.18.0-10-generic/kernel/drivers/firmware /tmp # cp lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/ /tmp # exit ... │ Select disk to partition: │ │ │ │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│ ... ~ # grep 'retrieving disk-detect' /var/log/syslog Jun 6 00:05:52 anna[1521]: DEBUG: retrieving disk-detect 1.117ubuntu6.18.10.1 ~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog Jun 6 00:10:59 disk-detect: insmod /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko Jun 6 00:10:59 kernel: [ 364.281283] iBFT detected. Jun 6 00:10:59 disk-detect: # BEGIN RECORD 2.0-874 Jun 6 00:10:59 disk-detect: iface.initiatorname = iqn.2010-04.org.ipxe:---- Jun 6 00:10:59 disk-detect: iface.hwaddress = 52:54:00:00:00:02 Jun 6 00:10:59 disk-detect: iface.bootproto = STATIC Jun 6 00:10:59 disk-detect: iface.ipaddress = 10.0.0.2 Jun 6 00:10:59 disk-detect: iface.subnet_mask = 255.255.255.0 Jun 6 00:10:59 disk-detect: iface.primary_dns = 192.168.122.1 Jun 6 00:10:59 disk-detect: iface.vlan_id = 0 Jun 6 00:10:59 disk-detect: iface.net_ifacename = ens4 Jun 6 00:10:59 disk-detect: node.name = iqn.2019-06.com.example:target1 Jun 6 00:10:59 disk-detect: node.conn[0].address = 10.0.0.1 Jun 6 00:10:59 disk-detect: node.conn[0].port = 3260 Jun 6 00:10:59 disk-detect: node.boot_lun = 0100 Jun 6 00:10:59 disk-detect: # END RECORD Jun 6 00:10:59 disk-detect: Setting up software interface ens4 Jun 6 00:10:59 disk-detect: iscsistart: version 2.0-874 Jun 6 00:10:59 kernel: [ 364.296265] Loading iSCSI transport class v2.0-870. Jun 6 00:10:59 kernel: [ 364.304322] iscsi: registered transport (tcp) Jun 6 00:10:59 kernel: [ 364.305731] scsi host2: iSCSI Initiator over TCP/IP Jun 6 00:10:59 disk-detect: iscsistart: Connection1:0 to [target: iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: d efault] is operational now Jun 6 00:10:59 kernel: [ 364.310276] scsi 2:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jun 6 00:10:59 kernel: [ 364.311404] scsi 2:0:0:0: Attached scsi generic sg0 type 12 Jun 6 00:10:59 disk-detect: iscsistart: Logging into iqn.2019-06.com.example:target1 10.0.0.1:3260,1 Jun 6 00:10:59 kernel: [ 364.312491] scsi 2:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jun 6 00:10:59 kernel: [ 364.313947] sd 2:0:0:1: Attached scsi generic sg1 type 0 Jun 6 00:10:59 kernel: [ 364.314374] sd 2:0:0:1: Power-on or device reset occurred Jun 6 00:10:59 kernel: [ 364.316862] sd 2:0:0:1: [sda] 16777216 512-byte logical blocks: (8.59 GB/8.00 GiB) Jun 6 00:10:59 kernel: [ 364.316864] sd 2:0:0:1: [sda] 4096-byte physical blocks Jun 6 00:10:59 kernel: [ 364.317102] sd 2:0:0:1: [sda] Write Protect is off Jun 6 00:10:59 kernel: [ 364.317104] sd 2:0:0:
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for bionic/hw-detect. disk-detect found the iscsi target/lun configured in ibft. bionic currently needed the workaround to download/install the iscsi_ibft.ko module, because the d-i changes are not yet in (update to kernel version that includes it in scsi-modules.udeb). $ wget http://boot.ipxe.org/ipxe.lkrn $ wget http://archive.ubuntu.com/ubuntu/dists/bionic-updates/main/installer-amd64/20101020ubuntu543.7/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz} $ python3 -m http.server & $ qemu-system-x86_64 \ -nodefaults \ -enable-kvm \ -smp 2 -m 4096 \ -serial stdio \ -vga virtio \ -display vnc=0.0.0.0:1 \ -netdev bridge,id=bridge-world,br=virbr0 \ -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \ -device virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \ -device virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \ -kernel ipxe.lkrn workstation $ vncviewer buneary.segmaas.1ss:1 iPXE> iPXE> ifopen net1 iPXE> set net1/ip 10.0.0.2 iPXE> set net1/netmask 255.255.255.0 iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1 Registered SAN device 0x80 iPXE> ifopen net0 iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0 iPXE> initrd http://192.168.122.1:8000/initrd.gz iPXE> boot ... ~ # ls /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko ls: /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko: No such file or directory ~ # cd /tmp /tmp # wget http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-modules-4.15.0-45-generic_4.15.0-45.48_amd64.deb /tmp # ar x linux-modules-4.15.0-45-generic_4.15.0-45.48_amd64.deb /tmp # xzcat data.tar.xz | tar x /tmp # mkdir /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/ /tmp # cp lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko /l ib/modules/4.15.0-45-generic/kernel/drivers/firmware/ /tmp # exit ... │ Select disk to partition: │ │ │ │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│ ... ~ # grep 'retrieving disk-detect' /var/log/syslog Jun 6 01:01:18 anna[1610]: DEBUG: retrieving disk-detect 1.117ubuntu6.18.04.1 ~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog Jun 6 01:05:17 disk-detect: insmod /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko Jun 6 01:05:17 kernel: [ 293.676205] iBFT detected. Jun 6 01:05:17 disk-detect: # BEGIN RECORD 2.0-874 Jun 6 01:05:17 disk-detect: iface.initiatorname = iqn.2010-04.org.ipxe:---- Jun 6 01:05:17 disk-detect: iface.hwaddress = 52:54:00:00:00:02 Jun 6 01:05:17 disk-detect: iface.bootproto = STATIC Jun 6 01:05:17 disk-detect: iface.ipaddress = 10.0.0.2 Jun 6 01:05:17 disk-detect: iface.subnet_mask = 255.255.255.0 Jun 6 01:05:17 disk-detect: iface.primary_dns = 192.168.122.1 Jun 6 01:05:17 disk-detect: iface.vlan_id = 0 Jun 6 01:05:17 disk-detect: iface.net_ifacename = ens4 Jun 6 01:05:17 disk-detect: node.name = iqn.2019-06.com.example:target1 Jun 6 01:05:17 disk-detect: node.conn[0].address = 10.0.0.1 Jun 6 01:05:17 disk-detect: node.conn[0].port = 3260 Jun 6 01:05:17 disk-detect: node.boot_lun = 0100 Jun 6 01:05:17 disk-detect: # END RECORD Jun 6 01:05:17 disk-detect: Setting up software interface ens4 Jun 6 01:05:17 disk-detect: iscsistart: version 2.0-874 Jun 6 01:05:17 kernel: [ 293.694700] Loading iSCSI transport class v2.0-870. Jun 6 01:05:17 kernel: [ 293.702842] iscsi: registered transport (tcp) Jun 6 01:05:17 disk-detect: iscsistart: Jun 6 01:05:17 disk-detect: Connection1:0 to [target: iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: default] is o perational now Jun 6 01:05:17 disk-detect: Jun 6 01:05:17 kernel: [ 293.704410] scsi host2: iSCSI Initiator over TCP/IP Jun 6 01:05:17 kernel: [ 293.709519] scsi 2:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jun 6 01:05:17 kernel: [ 293.711428] scsi 2:0:0:0: Attached scsi generic sg0 type 12 Jun 6 01:05:17 disk-detect: iscsistart: Logging into iqn.2019-06.com.example:target1 10.0.0.1:3260,1 Jun 6 01:05:17 kernel: [ 293.713414] scsi 2:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jun 6 01:05:17 kernel: [ 293.714350] sd 2:0:0:1: Attached scsi generic sg1 type 0 Jun 6 01:05:17 kernel: [ 293.714518] sd 2:0:0:1: Power-on or device reset occurred Jun 6 01:05:17 kernel: [ 293.716821] sd 2:0:0:1: [sda] 16777216 512-byte logical blocks: (8.59 GB/8.00 GiB) Jun 6 01:05:17 kernel: [ 293.716823] sd 2:0:0:1: [sda] 4096-byte physical blocks Jun 6 01:05:17 kernel: [ 293.717069] sd 2:0:0:1: [sda] Wri
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for disco/partman-iscsi. Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address. ~ # grep 'retrieving partman-iscsi' /var/log/syslog Jun 6 14:04:03 anna[1582]: DEBUG: retrieving partman-iscsi 40ubuntu3.19.04.1 With 'partman-iscsi/iscsi_auto=true': ~ # debconf-get partman-iscsi/iscsi_auto true ~ # cat /target/etc/iscsi/iscsi.initramfs ISCSI_AUTO=true With 'partman-iscsi/iscsi_auto=false': ~ # debconf-get partman-iscsi/iscsi_auto false ~ # cat /target/etc/iscsi/iscsi.initramfs HWADDR="52:54:00:00:00:02" ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1" ISCSI_TARGET_IP="10.0.0.1" ISCSI_TARGET_PORT="3260" ISCSI_TARGET_GROUP="1" ~ # ip addr list ... 2: ens3: ... link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3 ... 3: ens4: ... link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4 ... ** Tags removed: verification-needed-disco ** Tags added: verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: In Progress Status in hw-detect source package in Bionic: Fix Committed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Fix Committed Status in debian-installer source package in Cosmic: In Progress Status in hw-detect source package in Cosmic: Fix Committed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Fix Committed Status in debian-installer source package in Disco: In Progress Status in hw-detect source package in Disco: Fix Committed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Fix Committed Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option.
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for cosmic/partman-iscsi. Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address. ~ # grep 'retrieving partman-iscsi' /var/log/syslog Jun 6 14:20:51 anna[1521]: DEBUG: retrieving partman-iscsi 40ubuntu3.18.10.1 With 'partman-iscsi/iscsi_auto=true': ~ # debconf-get partman-iscsi/iscsi_auto true ~ # cat /target/etc/iscsi/iscsi.initramfs ISCSI_AUTO=true With 'partman-iscsi/iscsi_auto=false': ~ # debconf-get partman-iscsi/iscsi_auto false ~ # cat /target/etc/iscsi/iscsi.initramfs HWADDR="52:54:00:00:00:02" ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1" ISCSI_TARGET_IP="10.0.0.1" ISCSI_TARGET_PORT="3260" ISCSI_TARGET_GROUP="1" ~ # ip addr list ... 2: ens3: ... link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3 ... 3: ens4: ... link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4 ... ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: In Progress Status in hw-detect source package in Bionic: Fix Committed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Fix Committed Status in debian-installer source package in Cosmic: In Progress Status in hw-detect source package in Cosmic: Fix Committed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Fix Committed Status in debian-installer source package in Disco: In Progress Status in hw-detect source package in Disco: Fix Committed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Fix Committed Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option.
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
verification done for bionic/partman-iscsi. Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address. ~ # grep 'retrieving partman-iscsi' /var/log/syslog Jun 6 15:01:20 anna[1605]: DEBUG: retrieving partman-iscsi 40ubuntu3.18.04.1 With 'partman-iscsi/iscsi_auto=true': ~ # debconf-get partman-iscsi/iscsi_auto true ~ # cat /target/etc/iscsi/iscsi.initramfs ISCSI_AUTO=true With 'partman-iscsi/iscsi_auto=false': ~ # debconf-get partman-iscsi/iscsi_auto false ~ # cat /target/etc/iscsi/iscsi.initramfs HWADDR="52:54:00:00:00:02" ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1" ISCSI_TARGET_IP="10.0.0.1" ISCSI_TARGET_PORT="3260" ISCSI_TARGET_GROUP="1" ~ # ip addr list ... 2: ens3: ... link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3 ... 3: ens4: ... link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4 ... ** Tags removed: verification-needed verification-needed-bionic ** Tags added: verification-done verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: In Progress Status in hw-detect source package in Bionic: Fix Committed Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Fix Committed Status in debian-installer source package in Cosmic: In Progress Status in hw-detect source package in Cosmic: Fix Committed Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Fix Committed Status in debian-installer source package in Disco: In Progress Status in hw-detect source package in Disco: Fix Committed Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Fix Committed Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enab
[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT
Verified debian-installer in {disco,cosmic,bionic}-proposed. The verification has been done with the netboot image files for disco/cosmic/bionic (and hwe-netboot for bionic), using regular and lvm partitioning, on VMs for the architectures amd64/i386/arm64/ppc64el and baremetal for amd64. All tests successfully installed and booted, and have been checked for right release name, partitioning method, installed kernel version, installer's kernel version and kernel messages (no errors, no warnings, weird messages, etc.) i.e., $ lsb_release -cs $ mount | grep -w / $ uname -rvm $ sudo grep 'Linux version' /var/log/installer/syslog $ sudo grep kernel: /var/log/installer/syslog ** Tags removed: verification-needed verification-needed-bionic verification-needed-cosmic verification-needed-disco ** Tags added: verification-done verification-done-bionic verification-done-cosmic verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1817321 Title: installer does not support iSCSI iBFT Status in debian-installer package in Ubuntu: Fix Released Status in hw-detect package in Ubuntu: Fix Released Status in linux package in Ubuntu: Fix Released Status in partman-iscsi package in Ubuntu: Fix Released Status in debian-installer source package in Bionic: Fix Committed Status in hw-detect source package in Bionic: Fix Released Status in linux source package in Bionic: Fix Released Status in partman-iscsi source package in Bionic: Fix Released Status in debian-installer source package in Cosmic: Fix Committed Status in hw-detect source package in Cosmic: Fix Released Status in linux source package in Cosmic: Fix Released Status in partman-iscsi source package in Cosmic: Fix Released Status in debian-installer source package in Disco: Fix Committed Status in hw-detect source package in Disco: Fix Released Status in linux source package in Disco: Fix Released Status in partman-iscsi source package in Disco: Fix Released Status in debian-installer source package in Eoan: Fix Released Status in hw-detect source package in Eoan: Fix Released Status in linux source package in Eoan: Fix Released Status in partman-iscsi source package in Eoan: Fix Released Bug description: [Impact] * It's not possible to access iBFT (iSCSI Boot Firmware Table) information (settings for network interface, initiator, and target) in the installer because the 'iscsi_ibft' module is not present in udeb packages. * Even if it was, the installer does not handle iBFT information at all, thus any settings are ignored, and iSCSI-related configuration has to be done manually or with workarounds. * This impacts user-experience and automatic installation on systems and deployments which actually do provide the iBFT feature and information, but cannot use it practically. * With proper iBFT support in the installer (kernel module in udeb package and automatic iSCSI-related configuration) users will be able to rely on iBFT to install/deploy Ubuntu on their servers and datacenters. * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb, and configure network/iSCSI according to iBFT information in disk-detect. This is done in disk-detect so that the iSCSI LUNs are detected as disks (useful in case of no other disks in the system so the installer doesn't complain nor wait too long) and that any partman-related preseed options are not required and may be still available for the user. [Test Case] * linux package / kernel module in udeb: $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko Check the module loads in the installer environment. See comment with example for disco. * d-i/hw-detect/partman-iscsi package: See comments 11, 12, 13. [Regression Potential] * linux package: low, the kernel module is not loaded by default, and only checks whether iBFT information is present in firmware, then exposes that in sysfs in read-only mode. * d-i/hw-detect/partman-iscsi: - d-i: kernel version update to include iscsi_ibft module, based on kernel released to -updates plus one week monitoring bug reports -- it should be OK. Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64 on baremetal -- see comment 11. - hw-detect: low, the changes are enabled by a preseed option. see comment 12. - partman-iscsi: low, simple changes, plus one fix that has been tested in detail, and falls back to previous behavior if it fails. see comment 13. [Other Info] * This has been verified both by the developer with a simple iSCSI iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE) and by an u
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
** Attachment added: "kprobe-test.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5259304/+files/kprobe-test.c -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeout and higher console logging level to notice the deadlocked tasks sooner, and watch progress: $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs $ echo 9 | sudo tee /proc/sys/kernel/printk - Load module / Run userspace part (logging to kernel log) in XFS: $ sudo insmod kprobe-test.ko $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' $ sudo rmmod kprobe-test You may need to ctrl-z with the original kernel as 'test' doesn't finish. - Check kernel log or watch the system console: $ dmesg Check threads in D state. $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker [Regression Potential] * The patch is small but changes core writeback infrastructure, so there's a chance this may _affect_ some or other behavior that has not been validated with our regression testing; not exactly _break_ it. Please note our regression testing. * This has been verified with 'xfstests' (not only for XFS fs, despite its original name), used by major Linux filesystems for regression testing during development. It's been tested on systems with 24 and 4 CPUs (to exercise differences in scalability, parallelism, and workload) and XFS and ext4 (reporter's environment + Ubuntu's default). No regressions were observed (the set of failed tests is the same in each system and tests failed in the same way). * This has also been verified with 'iozone' for write intensive tests, to exercise the writeback mechanism and no errors were observed. * The reporter has been running the test kernel with the patch for weeks and has not observed any other issues/regressions. [Other Info] * This is only required in Cosmic (for the Bionic HWE kernel), and is already applied in Disco. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()
Verification successful on Cosmic. Updated the test-case kernel part (attached), and repeated it 20+ times, without any process hanging. In all cases, the new function call into write_cache_pages() is observed in thread 0, between page index 2 and page index 1. [ 150.914872] mod_init():161 :: hello [ 150.917828] mod_init():207 :: kernel version: prop/-19/cosmic [ 150.950322] Program running, TID = 1429 [ 150.951566] kp1_pre_handler():073 :: state 0 :: pid = 1429, mapping = 0x8abcba385570, comm = 'test' [ 150.954205] kp1_pre_handler():082 :: state 0 -> 1 :: pid = 1429, mapping = 0x8abcba385570 [ 150.956518] kp2_pre_handler():122 :: state 1 :: pid = 1429, page index = 1 [ 150.958410] kp3_pre_handler():147 :: state 1 :: pid = 1429, page index = 1, calling writepage() [ 150.961047] kp2_pre_handler():122 :: state 1 :: pid = 1429, page index = 2 [ 150.964788] kp3_pre_handler():147 :: state 1 :: pid = 1429, page index = 2, calling writepage() [ 151.973660] Thread 0 running, TID = 1430! [ 151.977071] kp1_pre_handler():073 :: state 1 :: pid = 7, mapping = 0x8abcba385570, comm = 'kworker/u8:0' [ 151.984836] kp1_pre_handler():104 :: state 1 -> 2 :: pid = 7, mapping = 0x8abcba385570, comm ('kworker/u8:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2. [ 152.017726] kp2_pre_handler():122 :: state 2 :: pid = 7, page index = 2 [ 152.027193] kp3_pre_handler():147 :: state 2 :: pid = 7, page index = 2, calling writepage() [ 152.038466] kp1_pre_handler():073 :: state 2 :: pid = 7, mapping = 0x8abcba385570, comm = 'kworker/u8:0' [ 152.048736] kp2_pre_handler():122 :: state 2 :: pid = 7, page index = 1 [ 152.056642] kp2_pre_handler():126 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()... [ 152.973731] Thread 1 running, TID = 1431! [ 152.974943] kp1_pre_handler():073 :: state 3 :: pid = 1431, mapping = 0x8abcba385570, comm = 'test' [ 152.977489] kp2_pre_handler():122 :: state 3 :: pid = 1431, page index = 1 [ 152.979140] kp3_pre_handler():147 :: state 3 :: pid = 1431, page index = 1, calling writepage() [ 152.981928] kp2_pre_handler():122 :: state 3 :: pid = 1431, page index = 2 [ 153.973895] Thread 2 running, TID = 1432! [ 153.975160] kp1_pre_handler():073 :: state 3 :: pid = 1432, mapping = 0x8abcba385570, comm = 'test' [ 153.978573] kp2_pre_handler():122 :: state 3 :: pid = 1432, page index = 1 [ 157.033588] kp2_pre_handler():130 :: state 3 -> 4 :: pid = 7, page index = 1, spun 5 seconds before lock_page(). [ 157.036151] kp3_pre_handler():147 :: state 4 :: pid = 1431, page index = 2, calling writepage() [ 157.038804] kp3_pre_handler():147 :: state 4 :: pid = 1432, page index = 1, calling writepage() [ 157.041212] kp2_pre_handler():122 :: state 4 :: pid = 1432, page index = 2 [ 157.058880] mod_exit():230 :: bye ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic ** Attachment removed: "kprobe-test.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255994/+files/kprobe-test.c -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824827 Title: tasks doing write()/fsync() hit deadlock in write_cache_pages() Status in linux package in Ubuntu: Invalid Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Invalid Bug description: [Impact] * Tasks of a multi-threaded workload doing write() and fsync() might deadlock in write_cache_pages(), preventing progress. * The fix addresses a corner case in write_cache_pages() on the range_cyclic implementation which allows the deadlock. * Patch: - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"), present in v4.20-rc1~92^2~19. [Test Case] * This issue originally has been hit by the 'perforce' (p4d) tool in a XFS filesystem, but it's difficult/rare to occur. * We've written an userspace + kernel module (kprobes-based) to reproduce this problem and verify the test kernel/patch. * The kprobes are strictly tied to particular kernel versions because of the assembly instruction offsets. We'll provide updated versions for -updates and -proposed for verification. * Steps (see output examples in comments): - Userspace part: $ gcc -o test test.c -pthread - Kernel part: $ touch Makefile $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules - Shorter hung task timeou
[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device
Tested with disco-proposed + patch (problem does not happen) --- # uname -rv 5.0.0-22-generic #23+test20190725b1 SMP Mon Jul 29 14:43:55 -03 2019 # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 [ 69.567775] bcache: register_bdev() registered backing device loop0/null 2>&1 [ 69.577141] bcache: run_cache_set() invalidating existing data [ 69.591172] bcache: register_cache() registered cache device loop1 [ 69.591517] bcache: register_bcache() error /dev/loop0: device already registered (emitting change event) [ 73.570620] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 0ed05289-ed85-40da-bcf4-3991f2e18e03 # (no warning message) # reboot # # comment last line in script. # ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1 # [ 40.045968] bcache: register_bdev() registered backing device loop0ev/null 2>& [ 40.050914] bcache: run_cache_set() invalidating existing data [ 40.060793] bcache: register_cache() registered cache device loop1 [ 40.068735] bcache: register_bcache() error /dev/loop1: device already registered (wait a few seconds) (ok no oops anymore) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837788 Title: bcache kernel warning when attaching device Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Confirmed Status in linux source package in Eoan: Invalid Bug description: See attached dmesg, each time this server is rebooted it emits a concerning bcache warning. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-54-generic 4.15.0-54.58 ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18 Uname: Linux 4.15.0-54-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-54-generic. AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Date: Wed Jul 24 12:28:06 2019 InstallationDate: Installed on 2013-10-04 (2119 days ago) InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 (20130925.1) MachineType: Supermicro X9DAi ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb RelatedPackageVersions: linux-restricted-modules-4.15.0-54-generic N/A linux-backports-modules-4.15.0-54-generic N/A linux-firmware 1.173.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago) dmi.bios.date: 05/09/2015 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 3.2 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: X9DAi dmi.board.vendor: Supermicro dmi.board.version: 0123456789 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789: dmi.product.family: To be filled by O.E.M. dmi.product.name: X9DAi dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device
Attaching test-case script. ** Attachment added: "setup-bcache-wb_percent-before-attach.sh" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+attachment/5279850/+files/setup-bcache-wb_percent-before-attach.sh -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837788 Title: bcache kernel warning when attaching device Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Confirmed Status in linux source package in Eoan: Invalid Bug description: See attached dmesg, each time this server is rebooted it emits a concerning bcache warning. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-54-generic 4.15.0-54.58 ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18 Uname: Linux 4.15.0-54-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-54-generic. AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Date: Wed Jul 24 12:28:06 2019 InstallationDate: Installed on 2013-10-04 (2119 days ago) InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 (20130925.1) MachineType: Supermicro X9DAi ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb RelatedPackageVersions: linux-restricted-modules-4.15.0-54-generic N/A linux-backports-modules-4.15.0-54-generic N/A linux-firmware 1.173.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago) dmi.bios.date: 05/09/2015 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 3.2 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: X9DAi dmi.board.vendor: Supermicro dmi.board.version: 0123456789 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789: dmi.product.family: To be filled by O.E.M. dmi.product.name: X9DAi dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device
Patches sent to the kernel-team mailing list: [B][PATCH] bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached https://lists.ubuntu.com/archives/kernel-team/2019-July/102653.html [D][PATCH] bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached https://lists.ubuntu.com/archives/kernel-team/2019-July/102654.html ** Description changed: + [Impact] + + * Users can get a Warning or even Oops the kernel if +bcache/writeback_percent is set before attaching a +caching device to the bcache device. + + * The fix is trivial, upstream, and consists of just +checking whether the caching device is attached in +order to set flags and schedule thread (which oops). + + [Test Case] + + * See attachment 'setup-bcache-wb_percent-before-attach.sh' +used in comment #5 and #6 to reproduce the problem(s). + + * for 'Warning': + +# make-bcache -B +# make-bcache -C +# echo 11 > /sys/block//bcache/writeback_percent +# sleep 1 +# echo > /sys/block//bcache/attach + + * for 'Oops': +(steps above, but don't run last command / 'attach'). + + [Regression Potential] + + * Low. The fix is trivial, contained, and exclusive to bcache sysfs + handler. + + * The modified path has been exercised with synthetic testing (script). + + [Original Bug Description] + See attached dmesg, each time this server is rebooted it emits a concerning bcache warning. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-54-generic 4.15.0-54.58 ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18 Uname: Linux 4.15.0-54-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-54-generic. AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.7 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 'amixer' Date: Wed Jul 24 12:28:06 2019 InstallationDate: Installed on 2013-10-04 (2119 days ago) InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 (20130925.1) MachineType: Supermicro X9DAi ProcEnviron: - TERM=xterm-256color - PATH=(custom, no user) - XDG_RUNTIME_DIR= - LANG=en_US.UTF-8 - SHELL=/bin/bash + TERM=xterm-256color + PATH=(custom, no user) + XDG_RUNTIME_DIR= + LANG=en_US.UTF-8 + SHELL=/bin/bash ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb RelatedPackageVersions: - linux-restricted-modules-4.15.0-54-generic N/A - linux-backports-modules-4.15.0-54-generic N/A - linux-firmware 1.173.9 + linux-restricted-modules-4.15.0-54-generic N/A + linux-backports-modules-4.15.0-54-generic N/A + linux-firmware 1.173.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago) dmi.bios.date: 05/09/2015 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 3.2 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: X9DAi dmi.board.vendor: Supermicro dmi.board.version: 0123456789 dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789: dmi.product.family: To be filled by O.E.M. dmi.product.name: X9DAi dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Disco) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837788 Title: bcache kernel warning when attaching device Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Disco: In Progress Status in li
[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)
Regression testing done on an older/previously supported adapter, SFC 7000 series. The netperf suite of TCP/UDP STREAM and RR, and TCP_RR ran for ~2 days, with results in the same ballpark as the original kernel and test kernels. Now waiting for test results with the new/requested adapter before marking verification done/successful. Summary: test name, mtu sizes, original/test/proposed kernel results. TCP_CRR 1500/1500 ORIG 4550-4560 TEST 4550-4580 PROP 5260-5316 9000/9000 ORIG 4557 TEST 4570 PROP 5260-5300 TCP_RR 1500/1500 ORIG 32531 TEST ~31k,32k PROP 32180-34277 9000/9000 ORIG 31620 TEST 27k-30k-36k PROP 27k-33k-34k TCP_STREAM 1500/1500 ORIG 9406 TEST 9403 PROP 9405 9000/9000 ORIG 9883 TEST 9887 PROP 9887 UDP_RR 1500/1500 ORIG ~36k/~37k TEST ~36k/~37k PROP ~36k 9000/9000 ORIG ~35k/~37k TEST 33k-37k PROP ~35.8k/~36.6k UDP_STREAM 1500/1500 ORIG 8.6k/8.9k TEST 8.9k PROP 8.6k/8.7k 9000/9000 ORIG 8.7k TEST 8.7k/8.8k PROP 8.7k -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1836635 Title: Bionic: support for Solarflare X2542 network adapter (sfc driver) Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Invalid Status in linux source package in Disco: Invalid Status in linux source package in Eoan: Invalid Bug description: [Impact] * Support for Solarflare X2542 network adapter (Medford2 / SFC9250) in the Bionic sfc driver. * This network adapter is present on recent hardware, at least HP 2019 and Dell PowerEdge R740xd systems. * On recent-hardware deployments that would rather use the Bionic LTS / GA supported kernel and cannot move to HWE kernels this adapter is non functional at all. [Test Case] * The X2542 adapter has been exercised with iperf3 and nc across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000 on both directions, for 1 week. Its performance is on par with the Cosmic 4.18 kernel (which contains all these patches) and the out-of-tree driver from the vendor. * The 7000 series adapter (for regression testing an old model, supported previously) has been exercised with iperf and netperf (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one host (client/server in different adapter ports isolated with network namespaces, so traffic goes through the network switch), on 10G link speed on MTUs 1500/9000, for 1 weekend. No regressions observed between the original and test kernels. [Regression Potential] * The patchset touches a lot of the sfc driver, so the potential for regression definitely exists. Thus, a lot of consideration and testing happened: * It has been tested on other adapter which uses the old code, and no regressions were found so far (see 7000 series above). * The patchset is exclusively cherry-picks, no single backport. * The patchset essentially moves the Bionic driver up in the upstream 'git log --oneline -- drivers/net/ethernet/sfc/': - since commit d4a7a8893d4c ("sfc: pass valid pointers from efx_enqueue_unwind") - until commit 7f61e6c6279b ("sfc: support FEC configuration through ethtool") - except for 2 commits (not needed / unrelated) - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of LOOPBACK_SGMII") - plus 2 more recent commits (fixes) - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats") - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new buffers") To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices
Verification with bionic-proposed of the I/O Error path. All good, working as expected (see comments #11 to #16). # uname -rv 4.15.0-56-generic #62-Ubuntu SMP Wed Jul 24 20:18:55 UTC 2019 test 1 -- # ./setup.sh >/dev/null 2>&1 [ 369.375820] bcache: register_bdev() registered backing device dm-0 [ 369.395195] bcache: run_cache_set() invalidating existing data [ 369.410278] bcache: register_cache() registered cache device dm-1 [ 371.393391] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set c1126837-e029-4d08-bad3-38ff8bc08054 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:001G 0 loop └─fake-loop0 253:00 1024M 0 dm └─bcache0 251:00 1024M 0 disk loop1 7:101G 0 loop └─fake-loop1 253:10 1024M 0 dm └─bcache0 251:00 1024M 0 disk On another shell: # fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all # [ 425.656209] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering [ 425.684837] bcache: error on c1126837-e029-4d08-bad3-38ff8bc08054: [ 425.684840] journal io error [ 425.686537] , disabling caching [ 425.688849] Buffer I/O error on dev bcache0, logical block 2807, lost async page write [ 425.691541] Buffer I/O error on dev bcache0, logical block 2808, lost async page write [ 425.694131] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it alive. [ 425.698343] Buffer I/O error on dev bcache0, logical block 2810, lost async page write [ 425.702522] Buffer I/O error on dev bcache0, logical block 2812, lost async page write [ 425.705326] Buffer I/O error on dev bcache0, logical block 2813, lost async page write [ 425.707896] Buffer I/O error on dev bcache0, logical block 2814, lost async page write [ 425.710692] Buffer I/O error on dev bcache0, logical block 2816, lost async page write [ 425.713524] Buffer I/O error on dev bcache0, logical block 2817, lost async page write [ 425.716512] Buffer I/O error on dev bcache0, logical block 2818, lost async page write [ 425.719156] Buffer I/O error on dev bcache0, logical block 2819, lost async page write [ 425.742817] bcache: cached_dev_detach_finish() Caching disabled for dm-0 [ 425.746933] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering [ 425.750502] bcache: cache_set_free() Cache set c1126837-e029-4d08-bad3-38ff8bc08054 unregistered fio finished: Run status group 0 (all jobs): WRITE: bw=212MiB/s (222MB/s), 212MiB/s-212MiB/s (222MB/s-222MB/s), io=1024MiB (1074MB), run=4830-4830msec bcache not on top of caching device: # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:001G 0 loop └─fake-loop0 253:00 1024M 0 dm └─bcache0 251:00 1024M 0 disk loop1 7:101G 0 loop fake-loop1 253:101G 0 dm test 2 -- # ./setup.sh >/dev/null 2>&1 [ 23.946411] bcache: register_bdev() registered backing device dm-0 [ 23.952262] bcache: run_cache_set() invalidating existing data [ 23.966564] bcache: register_cache() registered cache device dm-1 [ 25.949934] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set d7a3c644-e21e-49bb-bcee-e14709a65745 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:001G 0 loop └─fake-loop0 253:00 1024M 0 dm └─bcache0 251:00 1024M 0 disk loop1 7:101G 0 loop └─fake-loop1 253:10 1024M 0 dm └─bcache0 251:00 1024M 0 disk # echo writeback > /sys/block/bcache0/bcache/cache_mode # dd if=/dev/zero of=/dev/bcache0 bs=4k dd: error writing '/dev/bcache0': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.2152 s, 255 MB/s # ./dm_fake_dev.s
[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices
Verification done for Disco (one patch change only). Only one of the two bcache devices stop working upon failures in one backing device. (see comment #21 for details). # uname -rv 5.0.0-22-generic #23-Ubuntu SMP Tue Jul 23 17:23:54 UTC 2019 # ./setup-two-bcache-one-cache.sh >/dev/null 2>&1 [ 25.748828] bcache: register_bdev() registered backing device dm-1 [ 25.759145] bcache: register_bdev() registered backing device dm-0 [ 25.767247] bcache: run_cache_set() invalidating existing data [ 25.778928] bcache: register_cache() registered cache device dm-2 [ 26.768350] bcache: bch_cached_dev_attach() Caching dm-0 as bcache1 on set 2bf1e70a-6f20-4680-bc63-f803142f294d [ 26.795147] bcache: bch_cached_dev_attach() Caching dm-1 as bcache0 on set 2bf1e70a-6f20-4680-bc63-f803142f294d # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:001G 0 loop └─fake-loop0 253:00 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop1 7:101G 0 loop └─fake-loop1 253:10 1024M 0 dm └─bcache0 251:00 1024M 0 disk loop2 7:201G 0 loop └─fake-loop2 253:20 1024M 0 dm ├─bcache0 251:00 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk # echo writeback | tee /sys/block/bcache*/bcache/cache_mode writeback # echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed always # ./dm_fake_dev.sh /dev/loop0 bad [ 42.723192] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 42.730031] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 42.736198] bcache: register_bcache() error /dev/dm-0: device already registered (emitting change event) [ 42.738697] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.742277] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable # [ 42.746748] Buffer I/O error on dev bcache1, logical block 262112, async page read [ 42.752642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.755650] Buffer I/O error on dev bcache1, logical block 262112, async page read [ 42.758209] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.760642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.762860] Buffer I/O error on dev bcache1, logical block 1, async page read # dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k & [1] 1557 [2] 1558 # [ 58.982340] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.984076] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.985718] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.987382] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.989011] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.990645] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.992293] Buffer I/O error on dev bcache1, logical block 0, lost async page write [ 58.993733] Buffer I/O error on dev bcache1, logical block 1, lost async page write [ 58.995201] Buffer I/O error on dev bcache1, logical block 2, lost async page write [ 58.996651] Buffer I/O error on dev bcache1, logical block 3, lost async page write ... [ 59.096950] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 59.098669] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 59.100621] bcache: bch_cached_dev_error() stop bcache1: too many IO errors on backing device dm-0 [ 59.100621] dd: error writing '/dev/bcache1': No space left on device 262142+0 records in 262141+0 records out [ 60.111733] bcache: bcache_device_free() bcache1 stopped 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.10457 s, 510 MB/s dd: error writing '/dev/bcache0': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.67245 s, 230 MB/s # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:001G 0 loop loop1 7:101G 0 loop └─fake-loop1 253:10 1024M 0 dm └─bcache0 251:00 1024M 0 disk loop2 7:201G 0 loop └─fake-loop2 253:20 1024M 0 dm └─bcache0 251:00 1024M 0 disk fake-loop0 253:001G 0 dm only bcache1 was stopped. bcache0 remains working. # reboot # ./setup-two-bcache-one-cache.reboot.sh >/dev/null 2>&1 [ 17.606164] bcache: register_bdev() registered backing device dm-0 [ 17.672177] bcache: register_bdev() registered backing device dm-1 [ 17.752456] bcache: bch_journal_replay() journal replay done, 4936 keys in 6 entries, seq
[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices
Verification with bionic-proposed of xfstests results. No regressions introduced by this bcache patchset. The comparison between -updates and -proposed is not directly possible because -proposed introduced failures via other components in I/O path (e.g., block, ext4). This is described below, and just to make sure, the -proposed kernel has been rebuilt with the bcache patchset reverted, and test results are the same (same failures with/without the patchset; no regression). It's also been confirmed (below) that tests with a raw block device (sda) instead of bcache device (thus eliminating the bcache code) shows the new/introduced failures (ext4/035, generic/553, generic/554). (the output below does look better on a very wide screen. :-) proposed kernel: 4.15.0-55.62 (with patchset) --- xfstests.test.none.log: Failures: ext4/032 ext4/035 generic/371 --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 xfstests.test.writearound.log: Failures: ext4/032 ext4/035 --- generic/451 generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 xfstests.test.writeback.log:Failures: ext4/035 --- --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 xfstests.test.writethrough.log: Failures: ext4/032 ext4/035 --- --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 sda-only (no bcache)Failures: ext4/032 ext4/035 generic/371 --- generic/484 generic/491 --- --- generic/538 --- generic/553 generic/554 proposed kernel (4.15.0-56 without patchset) --- $ uname -rv 4.15.0-56-generic #62+test20190730b1 SMP Tue Jul 30 18:25:01 -03 2019 xfstests.test.none.log: Failures: ext4/032 ext4/035 generic/371 --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 xfstests.test.writearound.log: Failures: ext4/032 ext4/035 --- generic/451 generic/484 generic/491 --- generic/537 --- generic/547 generic/553 generic/554 xfstests.test.writeback.log:Failures: ext4/035 --- --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 xfstests.test.writethrough.log: Failures: ext4/032 ext4/035 --- --- generic/484 generic/491 --- generic/537 --- --- generic/553 generic/554 test kernel (4.15.0-55 with patchset) --- xfstests.test.none.log: Failures: ext4/032 --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writearound.log: Failures: ext4/032 --- generic/451 generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writeback.log:Failures: --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writethrough.log: Failures: ext4/032 --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- orig kernel (4.15.0-55) --- xfstests.test.none.log: Failures: ext4/032 --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writearound.log: Failures: ext4/032 generic/371 generic/451 generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writeback.log:Failures: --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- xfstests.test.writethrough.log: Failures: ext4/032 --- --- generic/484 generic/491 generic/504 generic/537 --- --- --- --- -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1829563 Title: bcache: risk of data loss on I/O errors in backing or caching devices Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Won't Fix Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Bug description: [Impact] * The bcache code in Bionic lacks several fixes to handle I/O errors in both backing devices and caching devices. * Partial or permanent errors in backing or caching devices, specially in writeback mode, can lead to data loss and/or the application is not notified about failed I/O requests. * The bcache de