[Kernel-packages] [Bug 1836635] [NEW] Bionic: support for Solarflare X2542 network adapter (sfc driver)

2019-07-15 Thread Mauricio Faria de Oliveira
Public bug reported:

[Impact]

 * Support for Solarflare X2542 network adapter
   (Medford2 / SFC9250) in the sfc driver.

 * This network adapter is present on recent hardware,
   at least HP 2019 and Dell PowerEdge R740xd systems.

 * On recent-hardware deployments that would rather use
   the Bionic LTS / GA supported kernel and cannot move
   to HWE kernels this adapter is non functional at all.

[Test Case]

 * The X2542 adapter has been exercised with iperf3 and nc
   across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
   on both directions, for 1 week.

   Its performance is on par with the Cosmic 4.18 kernel
   (which contains all these patches) and the out-of-tree
   driver from the vendor.

 * The 7000 series adapter (for regression testing an old model,
   supported previously) has been exercised with iperf and netperf
   (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
   host (client/server in different adapter ports isolated with
   network namespaces, so traffic goes through the network switch),
   on 10G link speed on MTUs 1500/9000, for 1 weekend. 

   No regressions observed between the original and test kernels.

[Regression Potential]

 * The patchset touches a lot of the sfc driver, so the potential
   for regression definitely exists.  It has been tested on other
   adapter which uses the old code, and no regressions were found.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Invalid

** Affects: linux (Ubuntu Bionic)
 Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
 Status: In Progress

** Affects: linux (Ubuntu Cosmic)
 Importance: Undecided
 Status: Invalid

** Affects: linux (Ubuntu Disco)
 Importance: Undecided
 Status: Invalid

** Affects: linux (Ubuntu Eoan)
 Importance: Undecided
 Status: Invalid

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: linux (Ubuntu Cosmic)
   Status: New => Invalid

** Changed in: linux (Ubuntu Disco)
   Status: New => Invalid

** Changed in: linux (Ubuntu Eoan)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1836635

Title:
  Bionic: support for Solarflare X2542 network adapter (sfc driver)

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Invalid
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Support for Solarflare X2542 network adapter
 (Medford2 / SFC9250) in the sfc driver.

   * This network adapter is present on recent hardware,
 at least HP 2019 and Dell PowerEdge R740xd systems.

   * On recent-hardware deployments that would rather use
 the Bionic LTS / GA supported kernel and cannot move
 to HWE kernels this adapter is non functional at all.

  [Test Case]

   * The X2542 adapter has been exercised with iperf3 and nc
 across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
 on both directions, for 1 week.

 Its performance is on par with the Cosmic 4.18 kernel
 (which contains all these patches) and the out-of-tree
 driver from the vendor.

   * The 7000 series adapter (for regression testing an old model,
 supported previously) has been exercised with iperf and netperf
 (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
 host (client/server in different adapter ports isolated with
 network namespaces, so traffic goes through the network switch),
 on 10G link speed on MTUs 1500/9000, for 1 weekend. 

 No regressions observed between the original and test kernels.

  [Regression Potential]

   * The patchset touches a lot of the sfc driver, so the potential
 for regression definitely exists.  It has been tested on other
 adapter which uses the old code, and no regressions were found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)

2019-07-15 Thread Mauricio Faria de Oliveira
** Description changed:

  [Impact]
  
-  * Support for Solarflare X2542 network adapter
-(Medford2 / SFC9250) in the sfc driver.
+  * Support for Solarflare X2542 network adapter
+    (Medford2 / SFC9250) in the sfc driver.
  
-  * This network adapter is present on recent hardware,
-at least HP 2019 and Dell PowerEdge R740xd systems.
+  * This network adapter is present on recent hardware,
+    at least HP 2019 and Dell PowerEdge R740xd systems.
  
-  * On recent-hardware deployments that would rather use
-the Bionic LTS / GA supported kernel and cannot move
-to HWE kernels this adapter is non functional at all.
+  * On recent-hardware deployments that would rather use
+    the Bionic LTS / GA supported kernel and cannot move
+    to HWE kernels this adapter is non functional at all.
  
  [Test Case]
  
-  * The X2542 adapter has been exercised with iperf3 and nc
-across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
-on both directions, for 1 week.
+  * The X2542 adapter has been exercised with iperf3 and nc
+    across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
+    on both directions, for 1 week.
  
-Its performance is on par with the Cosmic 4.18 kernel
-(which contains all these patches) and the out-of-tree
-driver from the vendor.
+    Its performance is on par with the Cosmic 4.18 kernel
+    (which contains all these patches) and the out-of-tree
+    driver from the vendor.
  
-  * The 7000 series adapter (for regression testing an old model,
-supported previously) has been exercised with iperf and netperf
-(TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
-host (client/server in different adapter ports isolated with
-network namespaces, so traffic goes through the network switch),
-on 10G link speed on MTUs 1500/9000, for 1 weekend. 
+  * The 7000 series adapter (for regression testing an old model,
+    supported previously) has been exercised with iperf and netperf
+    (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
+    host (client/server in different adapter ports isolated with
+    network namespaces, so traffic goes through the network switch),
+    on 10G link speed on MTUs 1500/9000, for 1 weekend.
  
-No regressions observed between the original and test kernels.
+    No regressions observed between the original and test kernels.
  
  [Regression Potential]
  
   * The patchset touches a lot of the sfc driver, so the potential
-for regression definitely exists.  It has been tested on other
-adapter which uses the old code, and no regressions were found.
+for regression definitely exists. Thus, a lot of consideration
+and testing happened:
+ 
+  * It has been tested on other adapter which uses the old code,
+and no regressions were found so far (see 7000 series above).
+ 
+  * The patchset essentially moves the driver in Bionic up in the
+upstream 'git log':
+- since commit d4a7a8893d4c ("sfc: pass valid pointers from 
efx_enqueue_unwind")
+- until commit 7f61e6c6279b ("sfc: support FEC configuration through 
ethtool")
+- except for 2 commits (not needed / unrelated)
+  - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple 
filters")
+  - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of 
LOOPBACK_SGMII")
+- plus 2 more recent commits (fixes)
+  - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats")
+  - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new 
buffers")
+ 
+  * The patchset is exclusively cherry-picks, no single backport.

** Description changed:

  [Impact]
  
   * Support for Solarflare X2542 network adapter
     (Medford2 / SFC9250) in the sfc driver.
  
   * This network adapter is present on recent hardware,
     at least HP 2019 and Dell PowerEdge R740xd systems.
  
   * On recent-hardware deployments that would rather use
     the Bionic LTS / GA supported kernel and cannot move
     to HWE kernels this adapter is non functional at all.
  
  [Test Case]
  
   * The X2542 adapter has been exercised with iperf3 and nc
     across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
     on both directions, for 1 week.
  
     Its performance is on par with the Cosmic 4.18 kernel
     (which contains all these patches) and the out-of-tree
     driver from the vendor.
  
   * The 7000 series adapter (for regression testing an old model,
     supported previously) has been exercised with iperf and netperf
     (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
     host (client/server in different adapter ports isolated with
     network namespaces, so traffic goes through the network switch),
     on 10G link speed on MTUs 1500/9000, for 1 weekend.
  
     No regressions observed between the original and test kernels.
  
  [Regression Potential]
  
-  * The patchset touches a lot of the sfc driver, so the potential
-for regression definitely exists. Thus, a l

[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)

2019-07-15 Thread Mauricio Faria de Oliveira
[Bionic][PULL] sfc: patches for LP#1836635
https://lists.ubuntu.com/archives/kernel-team/2019-July/102196.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1836635

Title:
  Bionic: support for Solarflare X2542 network adapter (sfc driver)

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Invalid
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Support for Solarflare X2542 network adapter
     (Medford2 / SFC9250) in the Bionic sfc driver.

   * This network adapter is present on recent hardware,
     at least HP 2019 and Dell PowerEdge R740xd systems.

   * On recent-hardware deployments that would rather use
     the Bionic LTS / GA supported kernel and cannot move
     to HWE kernels this adapter is non functional at all.

  [Test Case]

   * The X2542 adapter has been exercised with iperf3 and nc
     across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
     on both directions, for 1 week.

     Its performance is on par with the Cosmic 4.18 kernel
     (which contains all these patches) and the out-of-tree
     driver from the vendor.

   * The 7000 series adapter (for regression testing an old model,
     supported previously) has been exercised with iperf and netperf
     (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
     host (client/server in different adapter ports isolated with
     network namespaces, so traffic goes through the network switch),
     on 10G link speed on MTUs 1500/9000, for 1 weekend.

     No regressions observed between the original and test kernels.

  [Regression Potential]

   * The patchset touches a lot of the sfc driver, so the potential
     for regression definitely exists. Thus, a lot of consideration
     and testing happened:

   * It has been tested on other adapter which uses the old code,
     and no regressions were found so far (see 7000 series above).

   * The patchset is exclusively cherry-picks, no single backport.

   * The patchset essentially moves the Bionic driver up in the
     upstream 'git log --oneline -- drivers/net/ethernet/sfc/':

     - since commit d4a7a8893d4c ("sfc: pass valid pointers from 
efx_enqueue_unwind")
     - until commit 7f61e6c6279b ("sfc: support FEC configuration through 
ethtool")
     - except for 2 commits (not needed / unrelated)
   - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple 
filters")
   - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of 
LOOPBACK_SGMII")
     - plus 2 more recent commits (fixes)
   - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats")
   - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new 
buffers")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)

2019-07-15 Thread Mauricio Faria de Oliveira
Regression test results/log/script,
for documentation purposes.

** Attachment added: "lp1836635-test-regression.tar.xz"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+attachment/5277232/+files/lp1836635-test-regression.tar.xz

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1836635

Title:
  Bionic: support for Solarflare X2542 network adapter (sfc driver)

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Invalid
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Support for Solarflare X2542 network adapter
     (Medford2 / SFC9250) in the Bionic sfc driver.

   * This network adapter is present on recent hardware,
     at least HP 2019 and Dell PowerEdge R740xd systems.

   * On recent-hardware deployments that would rather use
     the Bionic LTS / GA supported kernel and cannot move
     to HWE kernels this adapter is non functional at all.

  [Test Case]

   * The X2542 adapter has been exercised with iperf3 and nc
     across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
     on both directions, for 1 week.

     Its performance is on par with the Cosmic 4.18 kernel
     (which contains all these patches) and the out-of-tree
     driver from the vendor.

   * The 7000 series adapter (for regression testing an old model,
     supported previously) has been exercised with iperf and netperf
     (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
     host (client/server in different adapter ports isolated with
     network namespaces, so traffic goes through the network switch),
     on 10G link speed on MTUs 1500/9000, for 1 weekend.

     No regressions observed between the original and test kernels.

  [Regression Potential]

   * The patchset touches a lot of the sfc driver, so the potential
     for regression definitely exists. Thus, a lot of consideration
     and testing happened:

   * It has been tested on other adapter which uses the old code,
     and no regressions were found so far (see 7000 series above).

   * The patchset is exclusively cherry-picks, no single backport.

   * The patchset essentially moves the Bionic driver up in the
     upstream 'git log --oneline -- drivers/net/ethernet/sfc/':

     - since commit d4a7a8893d4c ("sfc: pass valid pointers from 
efx_enqueue_unwind")
     - until commit 7f61e6c6279b ("sfc: support FEC configuration through 
ethtool")
     - except for 2 commits (not needed / unrelated)
   - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple 
filters")
   - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of 
LOOPBACK_SGMII")
     - plus 2 more recent commits (fixes)
   - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats")
   - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new 
buffers")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-07-19 Thread Mauricio Faria de Oliveira
The documentation about the feature/parameters has been merged [1]
into Server Guide -> Installation -> Advanced Installaion -> iSCSI [2].

It's not yet published right now, so the HTML might be updated later.

cheers,
Mauricio

[1] https://code.launchpad.net/~mfo/serverguide/ibft/+merge/370264
[2] https://help.ubuntu.com/lts/serverguide/advanced-installation.html#iscsi

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Fix Released
Status in hw-detect source package in Bionic:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Fix Released
Status in debian-installer source package in Cosmic:
  Fix Released
Status in hw-detect source package in Cosmic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Fix Released
Status in debian-installer source package in Disco:
  Fix Released
Status in hw-detect source package in Disco:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Fix Released
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-04-11 Thread Mauricio Faria de Oliveira
The verification for bionic/cosmic -proposed is expected to finish by
tomorrow (Apr 12).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
     commit c5a94f434c82 ("fscache: fix race between enablement and
     dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case,
     but these are the conditions to hit the problem:

     1) The active dataset size is equal to the cache disk size.
    The application reads the data over and over again.
     2) Disk is near full (90%+)
     3) cachefilesd in userspace is trying to cull the old objects
    while new objects are being looked up.
     4) new cachefiles are created and some fail with no disk space.
     5) race in dropping object state machine and
    deferred lookup state machine causes the hang.
     6) HUNG in fscache_wait_for_deferred_lookup for
    clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

  [Regression Potential]

   * Low; contained in fscache; no further fixes applied upstream.

   * This patch is applied in a stable tree (linux-4.4.y).

  [Original Description]

  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.

  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

  This patch is required in Bionic and Cosmic, and it's applied in
  Xenial (via stable) and Disco.

  Apparently there's no simple test case, but these are the conditions
  to hit the problem:

  1) The active dataset size is equal to the cache disk size.
     The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
     while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
  5) race in dropping object state machine and
     deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
     clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-04-12 Thread Mauricio Faria de Oliveira
Verification successful with xfstests on nfs+fscache.
No regression in cosmic-proposed from cosmic-updates.

cosmic-updates / 4.18.0-17:

Failures: generic/035 generic/258 generic/294 generic/448 generic/467 
generic/477 generic/484 generic/490 generic/495
Failed 9 of 437 tests

cosmic-proposed / 4.18.0-18:

Failures: generic/035 generic/258 generic/294 generic/448 generic/467 
generic/477 generic/484 generic/490 generic/495
Failed 9 of 437 tests

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
     commit c5a94f434c82 ("fscache: fix race between enablement and
     dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case,
     but these are the conditions to hit the problem:

     1) The active dataset size is equal to the cache disk size.
    The application reads the data over and over again.
     2) Disk is near full (90%+)
     3) cachefilesd in userspace is trying to cull the old objects
    while new objects are being looked up.
     4) new cachefiles are created and some fail with no disk space.
     5) race in dropping object state machine and
    deferred lookup state machine causes the hang.
     6) HUNG in fscache_wait_for_deferred_lookup for
    clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

  [Regression Potential]

   * Low; contained in fscache; no further fixes applied upstream.

   * This patch is applied in a stable tree (linux-4.4.y).

  [Original Description]

  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.

  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

  This patch is required in Bionic and Cosmic, and it's applied in
  Xenial (via stable) and Disco.

  Apparently there's no simple test case, but these are the conditions
  to hit the problem:

  1) The active dataset size is equal to the cache disk size.
     The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
     while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
  5) race in dropping object state machine and
     deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
     clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-04-12 Thread Mauricio Faria de Oliveira
Verification successful with xfstests on nfs+fscache.
No regression in bionic-proposed from bionic-updates.

bionic-updates / 4.15.0-47:

Failures: generic/035 generic/075 generic/091 generic/112 generic/263 
generic/294 generic/306 generic/307 generic/430 generic/431 generic/434 
generic/469 generic/484 generic/495
Failed 14 of 437 tests

bionic-proposed / 4.15.0-48:
 
Failures: generic/035 generic/075 generic/091 generic/112 generic/263 
generic/294 generic/306 generic/307 generic/430 generic/431 generic/434 
generic/469 generic/484 generic/495
Failed 14 of 437 tests




** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
     commit c5a94f434c82 ("fscache: fix race between enablement and
     dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case,
     but these are the conditions to hit the problem:

     1) The active dataset size is equal to the cache disk size.
    The application reads the data over and over again.
     2) Disk is near full (90%+)
     3) cachefilesd in userspace is trying to cull the old objects
    while new objects are being looked up.
     4) new cachefiles are created and some fail with no disk space.
     5) race in dropping object state machine and
    deferred lookup state machine causes the hang.
     6) HUNG in fscache_wait_for_deferred_lookup for
    clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

  [Regression Potential]

   * Low; contained in fscache; no further fixes applied upstream.

   * This patch is applied in a stable tree (linux-4.4.y).

  [Original Description]

  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.

  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

  This patch is required in Bionic and Cosmic, and it's applied in
  Xenial (via stable) and Disco.

  Apparently there's no simple test case, but these are the conditions
  to hit the problem:

  1) The active dataset size is equal to the cache disk size.
     The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
     while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
  5) race in dropping object state machine and
     deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
     clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-04-12 Thread Mauricio Faria de Oliveira
Regression testing setup/steps
===


fscache
---

sudo apt-get -y install cachefilesd
echo 'RUN=yes' | sudo tee -a /etc/default/cachefilesd
sudo modprobe fscache
sudo systemctl start cachefilesd


nfs
---

sudo apt-get -y install nfs-kernel-server
sudo systemctl start nfs-kernel-server

sudo mkdir -p /{srv,mnt}/nfs-{test,scratch}

# different fsid if in the same local filesystem
echo '/srv/nfs-test127.0.0.1(rw,no_subtree_check,no_root_squash,fsid=0)' | 
sudo tee -a /etc/exports
echo '/srv/nfs-scratch 127.0.0.1(rw,no_subtree_check,no_root_squash,fsid=1)' | 
sudo tee -a /etc/exports
sudo exportfs -ra


xfs-tests
-

sudo apt-get -y install automake gcc make git xfsprogs xfslibs-dev \
  uuid-dev uuid-runtime libtool-bin e2fsprogs libuuid1 attr libattr1-dev \
  libacl1-dev libaio-dev libgdbm-dev quota gawk fio dbench python sqlite3

git clone https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
cd xfstests-dev

git log --oneline -1 HEAD
  f3c1bca generic: Test that SEEK_HOLE can find a punched hole

make -j$(nproc); echo $?   # must be 0

sudo useradd fsgqa
sudo groupadd fsgqa
sudo useradd 123456-fsgqa

export TEST_DEV=127.0.0.1:/srv/nfs-test
export TEST_DIR=/mnt/nfs-test

export SCRATCH_DEV=127.0.0.1:/srv/nfs-scratch
export SCRATCH_MNT=/mnt/nfs-scratch

export TEST_FS_MOUNT_OPTS="-o fsc"  # for fscache / test dev
export NFS_MOUNT_OPTIONS="-o fsc"   # for fscache / scratch dev

cd ~/xfstests-dev 
sudo -E ./check -nfs -g quick 2>&1 | tee ~/xfs-tests.nfs.log.$(uname -r)
<...>

---

In another terminal, check the NFS mounts are indeed with the 'fsc'
(fscache) attribute:

$ mount | grep nfs | grep fsc
127.0.0.1:/srv/nfs-test on /mnt/nfs-test type nfs4 
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,fsc,local_lock=none,addr=127.0.0.1)
127.0.0.1:/srv/nfs-scratch on /mnt/nfs-scratch type nfs4 
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,fsc,local_lock=none,addr=127.0.0.1)

And compare fscache stats before/after run:

$ cat /proc/fs/fscache/stats 
FS-Cache statistics
Cookies: idx=0 dat=0 spc=0
Objects: alc=0 nal=0 avl=0 ded=0
ChkAux : non=0 ok=0 upd=0 obs=0
Pages  : mrk=0 unc=0
Acquire: n=0 nul=0 noc=0 ok=0 nbf=0 oom=0
Lookups: n=0 neg=0 pos=0 crt=0 tmo=0
Invals : n=0 run=0
Updates: n=0 nul=0 run=0
Relinqs: n=0 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=0 ok=0 wt=0 nod=0 nbf=0 int=0 oom=0
Retrvls: ops=0 owt=0 abt=0
Stores : n=0 ok=0 agn=0 nbf=0 oom=0
Stores : ops=0 run=0 pgs=0 rxd=0 olm=0
VmScan : nos=0 gon=0 bsy=0 can=0 wt=0
Ops: pend=0 run=0 enq=0 can=0 rej=0
Ops: ini=0 dfr=0 rel=0 gc=0
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0

...

$ cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=412 dat=2441632 spc=0
Objects: alc=8929 nal=0 avl=8741 ded=8928
ChkAux : non=0 ok=86 upd=0 obs=1123
Pages  : mrk=371441 unc=371441
Acquire: n=2442044 nul=0 noc=0 ok=2442044 nbf=0 oom=0
Lookups: n=8929 neg=8817 pos=112 crt=8817 tmo=0
Invals : n=152 run=152
Updates: n=0 nul=0 run=152
Relinqs: n=2442044 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=1498 ok=0 wt=195 nod=1498 nbf=0 int=0 oom=0
Retrvls: ops=1498 owt=575 abt=0
Stores : n=371145 ok=371145 agn=0 nbf=0 oom=0
Stores : ops=1117 run=372234 pgs=371118 rxd=371118 olm=0
VmScan : nos=49 gon=0 bsy=0 can=0 wt=0
Ops: pend=575 run=2767 enq=372387 can=0 rej=0
Ops: ini=372795 dfr=37 rel=372795 gc=37
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=1123 stl=0 rtr=0 cul=0

---

Note, in 4.15.0 kernels, some tests apparently run forever:
generic/430, 431 and 434 (same behavior in nfs+fscache, ext4, xfs),
they were killed with 'sudo kill -TERM $(pidof xfs_io)'.

# ref: https://wiki.linux-nfs.org/wiki/index.php/Xfstests

** Tags removed: verification-needed-cosmic
** Tags added: verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
     commit c5a94f434c82 ("fscache: fix race between enablement and
     dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case,
  

[Kernel-packages] [Bug 1824827] [NEW] tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
Public bug reported:

[Impact]

 * Tasks of a multi-threaded workload doing write() and fsync()
   might deadlock in write_cache_pages(), preventing progress.

 * The fix addresses a corner case in write_cache_pages() on
   the range_cyclic implementation which allows the deadlock.

 * Patch:
   - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
 ("mm/page-writeback.c: fix range_cyclic writeback vs
 writepages deadlock"), present in v4.20-rc1~92^2~19.

[Test Case]

 * This issue originally has been hit by the 'perforce' (p4d)
   tool in a XFS filesystem, but it's difficult/rare to occur.

 * We've written an userspace + kernel module (kprobes-based)
   to reproduce this problem and verify the test kernel/patch.

 * The kprobes are strictly tied to particular kernel versions
   because of the assembly instruction offsets.  We'll provide
   updated versions for -updates and -proposed for verification.

 * Steps 
   (see output examples in comments):

   - Userspace part:
   $ gcc -o test test.c -pthread

   - Kernel part:
   $ touch Makefile 
   $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
   $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

   - Shorter hung task timeout and higher console logging level
 to notice the deadlocked tasks sooner, and watch progress:
   $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
   $ echo 9 | sudo tee /proc/sys/kernel/printk 

   - Load module / Run userspace part (logging to kernel log) in XFS:
   $ sudo insmod kprobe-test.ko
   $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
   $ sudo rmmod kprobe-test

   You may need to ctrl-z with the original kernel as 'test' doesn't
finish.

   - Check kernel log or watch the system console:
   $ dmesg

   Check threads in D state.
   $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker


[Regression Potential] 

 * The patch is small but changes core writeback infrastructure,
   so there's a chance this may _affect_ some or other behavior
   that has not been validated with our regression testing; not
   exactly _break_ it.  Please note our regression testing.

 * This has been verified with 'xfstests' (not only for XFS fs,
   despite its original name), used by major Linux filesystems
   for regression testing during development. It's been tested
   on systems with 24 and 4 CPUs (to exercise differences in
   scalability, parallelism, and workload) and XFS and ext4
   (reporter's environment + Ubuntu's default).
   No regressions were observed (the set of failed tests is
   the same in each system and tests failed in the same way).
   
 * This has also been verified with 'iozone' for write intensive
   tests, to exercise the writeback mechanism and no errors were
   observed.

 * The reporter has been running the test kernel with the patch
   for weeks and has not observed any other issues/regressions.

[Other Info]
 
 * This is only required in Cosmic (for the Bionic HWE kernel),
   and is already applied in Disco.

** Affects: linux (Ubuntu)
 Importance: Undecided
     Status: Invalid

** Affects: linux (Ubuntu Cosmic)
 Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
 Status: Confirmed

** Affects: linux (Ubuntu Disco)
 Importance: Undecided
 Status: Invalid

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Disco)
   Status: New => Invalid

** Changed in: linux (Ubuntu Cosmic)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Cosmic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel mod

[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
testcase, kernel part.

** Attachment added: "kprobe-test.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255994/+files/kprobe-test.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
 $ echo 9 | sudo tee /proc/sys/kernel/printk 

 - Load module / Run userspace part (logging to kernel log) in XFS:
 $ sudo insmod kprobe-test.ko
 $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
 $ sudo rmmod kprobe-test

 You may need to ctrl-z with the original kernel as 'test' doesn't
  finish.

 - Check kernel log or watch the system console:
 $ dmesg

 Check threads in D state.
 $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

  
  [Regression Potential] 

   * The patch is small but changes core writeback infrastructure,
 so there's a chance this may _affect_ some or other behavior
 that has not been validated with our regression testing; not
 exactly _break_ it.  Please note our regression testing.

   * This has been verified with 'xfstests' (not only for XFS fs,
 despite its original name), used by major Linux filesystems
 for regression testing during development. It's been tested
 on systems with 24 and 4 CPUs (to exercise differences in
 scalability, parallelism, and workload) and XFS and ext4
 (reporter's environment + Ubuntu's default).
 No regressions were observed (the set of failed tests is
 the same in each system and tests failed in the same way).
 
   * This has also been verified with 'iozone' for write intensive
 tests, to exercise the writeback mechanism and no errors were
 observed.

   * The reporter has been running the test kernel with the patch
 for weeks and has not observed any other issues/regressions.

  [Other Info]
   
   * This is only required in Cosmic (for the Bionic HWE kernel),
 and is already applied in Disco.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
testcase, userspace part.

** Attachment added: "test.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255995/+files/test.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
 $ echo 9 | sudo tee /proc/sys/kernel/printk 

 - Load module / Run userspace part (logging to kernel log) in XFS:
 $ sudo insmod kprobe-test.ko
 $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
 $ sudo rmmod kprobe-test

 You may need to ctrl-z with the original kernel as 'test' doesn't
  finish.

 - Check kernel log or watch the system console:
 $ dmesg

 Check threads in D state.
 $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

  
  [Regression Potential] 

   * The patch is small but changes core writeback infrastructure,
 so there's a chance this may _affect_ some or other behavior
 that has not been validated with our regression testing; not
 exactly _break_ it.  Please note our regression testing.

   * This has been verified with 'xfstests' (not only for XFS fs,
 despite its original name), used by major Linux filesystems
 for regression testing during development. It's been tested
 on systems with 24 and 4 CPUs (to exercise differences in
 scalability, parallelism, and workload) and XFS and ext4
 (reporter's environment + Ubuntu's default).
 No regressions were observed (the set of failed tests is
 the same in each system and tests failed in the same way).
 
   * This has also been verified with 'iozone' for write intensive
 tests, to exercise the writeback mechanism and no errors were
 observed.

   * The reporter has been running the test kernel with the patch
 for weeks and has not observed any other issues/regressions.

  [Other Info]
   
   * This is only required in Cosmic (for the Bionic HWE kernel),
 and is already applied in Disco.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
testcase


original kernel :: latest cosmic version:


$ uname -rv
4.18.0-18-generic #19-Ubuntu SMP Tue Apr 2 18:13:16 UTC 2019

[  654.491029] kprobe_test: loading out-of-tree module taints kernel.
[  654.493322] kprobe_test: module verification failed: signature and/or 
required key missing - tainting kernel

[  654.497033] mod_init():158 :: hello
[  654.497976] mod_init():183 :: kernel version: orig/-18/cosmic

[  694.254271] Program running, TID = 3292
[  694.256600] kp1_pre_handler():070 :: state 0  :: pid =   3292, mapping = 
0x962333263730, comm = 'test'
[  694.260870] kp1_pre_handler():079 :: state 0 -> 1 :: pid =   3292, mapping = 
0x962333263730
[  694.262710] kp2_pre_handler():119 :: state 1  :: pid =   3292, page 
index = 1
[  694.264264] kp3_pre_handler():144 :: state 1  :: pid =   3292, page 
index = 1, calling writepage()
[  694.266641] kp2_pre_handler():119 :: state 1  :: pid =   3292, page 
index = 2
[  694.268456] kp3_pre_handler():144 :: state 1  :: pid =   3292, page 
index = 2, calling writepage()

[  695.276320] Thread 0 running, TID = 3293!
[  695.281210] kp1_pre_handler():070 :: state 1  :: pid =   1165, mapping = 
0x962333263730, comm = 'kworker/u4:2'
[  695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid =   1165, mapping = 
0x962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic 
(0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[  695.314808] kp2_pre_handler():119 :: state 2  :: pid =   1165, page 
index = 2
[  695.322822] kp3_pre_handler():144 :: state 2  :: pid =   1165, page 
index = 2, calling writepage()

[  695.330308] kp2_pre_handler():119 :: state 2  :: pid =   1165, page 
index = 1
[  695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid =   1165, page 
index = 1, spin 5 seconds before lock_page()...

[  696.283747] Thread 1 running, TID = 3295!
[  696.284623] kp1_pre_handler():070 :: state 3  :: pid =   3295, mapping = 
0x962333263730, comm = 'test'
[  696.286726] kp2_pre_handler():119 :: state 3  :: pid =   3295, page 
index = 1
[  696.288392] kp3_pre_handler():144 :: state 3  :: pid =   3295, page 
index = 1, calling writepage()
[  696.290018] kp2_pre_handler():119 :: state 3  :: pid =   3295, page 
index = 2

[  697.283941] Thread 2 running, TID = 3296!
[  697.284859] kp1_pre_handler():070 :: state 3  :: pid =   3296, mapping = 
0x962333263730, comm = 'test'
[  697.287246] kp2_pre_handler():119 :: state 3  :: pid =   3296, page 
index = 1
[  700.302756] kp2_pre_handler():127 :: state 3 -> 4 :: pid =   1165, page 
index = 1, spun 5 seconds before lock_page().

[  715.716717] INFO: task kworker/u4:2:1165 blocked for more than 10 seconds.
[  715.725486]   Tainted: G   OE 4.18.0-18-generic #19-Ubuntu
[  715.732832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  715.740500] kworker/u4:2D0  1165  2 0x8000
[  715.745615] Workqueue: writeback wb_workfn (flush-7:1)
[  715.750736] Call Trace:
[  715.753270]  __schedule+0x29e/0x840
[  715.756493]  schedule+0x2c/0x80
[  715.759369]  io_schedule+0x16/0x40
[  715.762044]  __lock_page+0x101/0x150
[  715.764729]  ? page_cache_tree_insert+0xe0/0xe0
[  715.773625]  write_cache_pages+0x283/0x4e0
[  715.782547]  ? xfs_vm_readpage+0x80/0x80 [xfs]
[  715.792525]  ? xfs_vm_readpage+0x80/0x80 [xfs]
[  715.798175]  ? write_cache_pages+0x5/0x4e0
[  715.803180]  xfs_vm_writepages+0x6b/0xa0 [xfs]
[  715.807087]  do_writepages+0x41/0xd0
[  715.810416]  __writeback_single_inode+0x40/0x360
[  715.813588]  ? fprop_fraction_percpu+0x26/0x80
[  715.816686]  writeback_sb_inodes+0x211/0x520
[  715.819584]  __writeback_inodes_wb+0x67/0xb0
[  715.822661]  wb_writeback+0x25f/0x2f0
[  715.824963]  ? get_nr_dirty_inodes+0x46/0x70
[  715.827180]  wb_workfn+0x175/0x3f0
[  715.829225]  process_one_work+0x20f/0x410
[  715.830964]  worker_thread+0x34/0x400
[  715.832646]  kthread+0x120/0x140
[  715.834551]  ? pwq_unbound_release_workfn+0xd0/0xd0
[  715.836902]  ? kthread_bind+0x40/0x40
[  715.838772]  ret_from_fork+0x35/0x40
[  715.840579] INFO: task test:3293 blocked for more than 10 seconds.
[  715.842927]   Tainted: G   OE 4.18.0-18-generic #19-Ubuntu
[  715.845279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  715.847771] testD0  3293   3291 0x
[  715.849826] Call Trace:
[  715.851266]  __schedule+0x29e/0x840
[  715.853289]  schedule+0x2c/0x80
[  715.855069]  wb_wait_for_completion+0x64/0x90
[  715.857179]  ? wait_woken+0x80/0x80
[  715.858973]  sync_inodes_sb+0xc7/0x290
[  715.860754]  sync_inodes_one_sb+0x15/0x20
[  715.862383]  iterate_supers+0xaa/0x100
[  715.863938]  ? default_file_splice_write+0x30/0x30
[  715.865666]  ksys_sync+0x42/0xb0
[  715.867082]  __ia32_sys_sync+0xe/0x20
[  715.869122]  do_syscall_64+0x5a/0x110
[  715.871041]  entry_SYSCALL_64_after_hw

[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
testcase


test kernel :: latest cosmic version + patch:


$ uname -rv
4.18.0-18-generic #19+test20190415b1 SMP Mon Apr 15 15:43:20 UTC 2019

[  169.145212] kprobe_test: loading out-of-tree module taints kernel.
[  169.149144] kprobe_test: module verification failed: signature and/or 
required key missing - tainting kernel

[  169.153539] mod_init():158 :: hello
[  169.154744] mod_init():190 :: kernel version: test/-18/cosmic

[  169.177027] Program running, TID = 2497
[  169.177978] kp1_pre_handler():070 :: state 0  :: pid =   2497, mapping = 
0x993df9136fb0, comm = 'test'
[  169.181080] kp1_pre_handler():079 :: state 0 -> 1 :: pid =   2497, mapping = 
0x993df9136fb0
[  169.183355] kp2_pre_handler():119 :: state 1  :: pid =   2497, page 
index = 1
[  169.185616] kp3_pre_handler():144 :: state 1  :: pid =   2497, page 
index = 1, calling writepage()
[  169.187779] kp2_pre_handler():119 :: state 1  :: pid =   2497, page 
index = 2
[  169.189186] kp3_pre_handler():144 :: state 1  :: pid =   2497, page 
index = 2, calling writepage()

[  170.194880] Thread 0 running, TID = 2498!
[  170.200011] kp1_pre_handler():070 :: state 1  :: pid =  7, mapping = 
0x993df9136fb0, comm = 'kworker/u4:0'
[  170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid =  7, mapping = 
0x993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic 
(0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[  170.238633] kp2_pre_handler():119 :: state 2  :: pid =  7, page 
index = 2
[  170.248024] kp3_pre_handler():144 :: state 2  :: pid =  7, page 
index = 2, calling writepage()
[  170.261141] kp1_pre_handler():070 :: state 2  :: pid =  7, mapping = 
0x993df9136fb0, comm = 'kworker/u4:0'
[  170.272150] kp2_pre_handler():119 :: state 2  :: pid =  7, page 
index = 1
[  170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid =  7, page 
index = 1, spin 5 seconds before lock_page()...

[  171.195090] Thread 1 running, TID = 2499!
[  171.196182] kp1_pre_handler():070 :: state 3  :: pid =   2499, mapping = 
0x993df9136fb0, comm = 'test'
[  171.198609] kp2_pre_handler():119 :: state 3  :: pid =   2499, page 
index = 1
[  171.200358] kp3_pre_handler():144 :: state 3  :: pid =   2499, page 
index = 1, calling writepage()
[  171.203717] kp2_pre_handler():119 :: state 3  :: pid =   2499, page 
index = 2

[  172.195297] Thread 2 running, TID = 2500!
[  172.196387] kp1_pre_handler():070 :: state 3  :: pid =   2500, mapping = 
0x993df9136fb0, comm = 'test'
[  172.198673] kp2_pre_handler():119 :: state 3  :: pid =   2500, page 
index = 1
[  175.252161] kp2_pre_handler():127 :: state 3 -> 4 :: pid =  7, page 
index = 1, spun 5 seconds before lock_page().

[  175.254922] kp3_pre_handler():144 :: state 4  :: pid =   2499, page 
index = 2, calling writepage()
[  175.256849] kp3_pre_handler():144 :: state 4  :: pid =   2500, page 
index = 1, calling writepage()
[  175.259166] kp2_pre_handler():119 :: state 4  :: pid =   2500, page 
index = 2
[  175.273178] mod_exit():213 :: bye

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo

[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
Patch posted for SRU:

[C][PATCH 0/1] Fix write()/fsync() deadlock in write_cache_pages()
https://lists.ubuntu.com/archives/kernel-team/2019-April/100084.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
 $ echo 9 | sudo tee /proc/sys/kernel/printk 

 - Load module / Run userspace part (logging to kernel log) in XFS:
 $ sudo insmod kprobe-test.ko
 $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
 $ sudo rmmod kprobe-test

 You may need to ctrl-z with the original kernel as 'test' doesn't
  finish.

 - Check kernel log or watch the system console:
 $ dmesg

 Check threads in D state.
 $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

  
  [Regression Potential] 

   * The patch is small but changes core writeback infrastructure,
 so there's a chance this may _affect_ some or other behavior
 that has not been validated with our regression testing; not
 exactly _break_ it.  Please note our regression testing.

   * This has been verified with 'xfstests' (not only for XFS fs,
 despite its original name), used by major Linux filesystems
 for regression testing during development. It's been tested
 on systems with 24 and 4 CPUs (to exercise differences in
 scalability, parallelism, and workload) and XFS and ext4
 (reporter's environment + Ubuntu's default).
 No regressions were observed (the set of failed tests is
 the same in each system and tests failed in the same way).
 
   * This has also been verified with 'iozone' for write intensive
 tests, to exercise the writeback mechanism and no errors were
 observed.

   * The reporter has been running the test kernel with the patch
 for weeks and has not observed any other issues/regressions.

  [Other Info]
   
   * This is only required in Cosmic (for the Bionic HWE kernel),
 and is already applied in Disco.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-15 Thread Mauricio Faria de Oliveira
The change introduced by the patch is evident in the kernel message log for 
Thread 0:
between page indexes 2 and 1 there's now another function call to 
write_cache_pages()
instead of just another iteration of the for-loop inside one call.

Original kernel:

[ 695.276320] Thread 0 running, TID = 3293!
[ 695.281210] kp1_pre_handler():070 :: state 1  :: pid = 1165, mapping = 
0x962333263730, comm = 'kworker/u4:2'
[ 695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 1165, mapping = 
0x962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic 
(0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 695.314808] kp2_pre_handler():119 :: state 2  :: pid = 1165, page index = 
2
[ 695.322822] kp3_pre_handler():144 :: state 2  :: pid = 1165, page index = 
2, calling writepage()
  << ... HERE ... >>
[ 695.330308] kp2_pre_handler():119 :: state 2  :: pid = 1165, page index = 
1
[ 695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 1165, page index = 
1, spin 5 seconds before lock_page()...

Test kernel:

[ 170.194880] Thread 0 running, TID = 2498!
[ 170.200011] kp1_pre_handler():070 :: state 1  :: pid = 7, mapping = 
0x993df9136fb0, comm = 'kworker/u4:0'
[ 170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 7, mapping = 
0x993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic 
(0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 170.238633] kp2_pre_handler():119 :: state 2  :: pid = 7, page index = 2
[ 170.248024] kp3_pre_handler():144 :: state 2  :: pid = 7, page index = 2, 
calling writepage()
[ 170.261141] kp1_pre_handler():070 :: state 2  :: pid = 7, mapping = 
0x993df9136fb0, comm = 'kworker/u4:0'
[ 170.272150] kp2_pre_handler():119 :: state 2  :: pid = 7, page index = 1
[ 170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 7, page index = 1, 
spin 5 seconds before lock_page()...

** Changed in: linux (Ubuntu Cosmic)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  In Progress
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
 $ echo 9 | sudo tee /proc/sys/kernel/printk 

 - Load module / Run userspace part (logging to kernel log) in XFS:
 $ sudo insmod kprobe-test.ko
 $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
 $ sudo rmmod kprobe-test

 You may need to ctrl-z with the original kernel as 'test' doesn't
  finish.

 - Check kernel log or watch the system console:
 $ dmesg

 Check threads in D state.
 $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

  
  [Regression Potential] 

   * The patch is small but changes core writeback infrastructure,
 so there's a chance this may _affect_ some or other behavior
 that has not been validated with our regression testing; not
 exactly _break_ it.  Please note our regression testing.

   * This has been verified with 'xfstests' (not only for XFS fs,
 despite its original name), used by major Linux filesystems
 for regression testing during development. It's been tested
 on systems with 24 and 4 CPUs (to exercise differences in
 scalability, parallelism, and workload) and XFS and 

[Kernel-packages] [Bug 1839521] Re: Xenial: ZFS deadlock in shrinker path with xattrs

2019-08-14 Thread Mauricio Faria de Oliveira
Hi @mathew-hodson,

This fix is needed in the Linux kernel package for Xenial as well (it
duplicates the zfs-linux/dkms source).

Shouldn't the 'linux' source package still be tracked?

Thanks,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1839521

Title:
  Xenial: ZFS deadlock in shrinker path with xattrs

Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

   * Xenial's ZFS can deadlock in the memory shrinker path
     after removing files with extended attributes (xattr).

   * Extended attributes are enabled by default, but are
     _not_ used by default, which reduces the likelyhood.

   * It's very difficult/rare to reproduce this problem,
     due to file/xattr/remove/shrinker/lru order/timing
     circumstances required. (weeks for a reporter user)
     but a synthetic test-case has been found for tests.

  [Test Case]

   * A synthetic reproducer is available for this LP,
     with a few steps to touch/setfattr/rm/drop_caches
     plus a kernel module to massage the disposal list.
 (comment #8)

   * In the original ZFS module:
     the xattr dir inode is not purged immediately on
     file removal, but possibly purged _two_ shrinker
     invocations later.  This allows for other thread
     started before file remove to call zfs_zget() on
     the xattr child inode and iput() it, so it makes
     to the same disposal list as the xattr dir inode.
 (comment #3)

   * In the modified ZFS module:
     the xattr dir inode is purged immediately on file
     removal not possibly later on shrinker invocation,
     so the problem window above doesn't exist anymore.
 (comment #12)

  [Regression Potential]

   * Low. The patches are confined to extended attributes
     in ZFS, specifically node removal/purge, and another
     change how an xattr child inode tracks its xattr dir
     (parent) inode, so that it can be purged immediately
     on removal.

   * The ZFS test-suite has been run on original/modified
     zfs-dkms package/kernel modules, with no regressions.
 (comment #11)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1839521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device

2019-08-15 Thread Mauricio Faria de Oliveira
Verification successful with disco-proposed.
No warning nor oops anymore.

# rmadison -s disco-proposed linux
 linux | 5.0.0-26.27 | disco-proposed | source

# uname -rv
5.0.0-26-generic #27-Ubuntu SMP Tue Aug 13 17:47:39 UTC 2019

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
[  171.879953] bcache: register_bdev() registered backing device loop0
[  171.920116] bcache: run_cache_set() invalidating existing data
[  171.931843] bcache: register_cache() registered cache device loop1
[  175.906911] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 
18fa9221-da9c-4e69-8b23-6eb093030c30

# reboot
# # comment last line in script.

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
[   91.990987] bcache: register_bdev() registered backing device loop0
[   94.001825] bcache: run_cache_set() invalidating existing data
[   94.018920] bcache: register_cache() registered cache device loop1
# sleep 10
#

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837788

Title:
  bcache kernel warning when attaching device

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Users can get a Warning or even Oops the kernel if
 bcache/writeback_percent is set before attaching a
 caching device to the bcache device.

   * The fix is trivial, upstream, and consists of just
 checking whether the caching device is attached in
 order to set flags and schedule thread (which oops).

  [Test Case]

   * See attachment 'setup-bcache-wb_percent-before-attach.sh'
 used in comment #5 and #6 to reproduce the problem(s).

   * for 'Warning':

 # make-bcache -B 
 # make-bcache -C 
 # echo 11 > /sys/block//bcache/writeback_percent
 # sleep 1
 # echo  > /sys/block//bcache/attach

   * for 'Oops':
 (steps above, but don't run last command / 'attach').

  [Regression Potential]

   * Low. The fix is trivial, contained, and exclusive to bcache sysfs
  handler.

   * The modified path has been exercised with synthetic testing
  (script).

  [Original Bug Description]

  See attached dmesg, each time this server is rebooted it emits a
  concerning bcache warning.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-54-generic 4.15.0-54.58
  ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18
  Uname: Linux 4.15.0-54-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k4.15.0-54-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', 
'/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', 
'/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Date: Wed Jul 24 12:28:06 2019
  InstallationDate: Installed on 2013-10-04 (2119 days ago)
  InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 
(20130925.1)
  MachineType: Supermicro X9DAi
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic 
root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline 
nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-54-generic N/A
   linux-backports-modules-4.15.0-54-generic  N/A
   linux-firmware 1.173.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago)
  dmi.bios.date: 05/09/2015
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3.2
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: X9DAi
  dmi.board.vendor: Supermicro
  dmi.board.version: 0123456789
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnAmeric

[Kernel-packages] [Bug 1839521] Re: Xenial: ZFS deadlock in shrinker path with xattrs

2019-08-19 Thread Mauricio Faria de Oliveira
Verification done for linux on xenial-proposed.

The inodes for file, xattr dir, and xattr child are all evicted at file
removal time, not making it to any disposal list after file removal.

So the window/scenario for the problem to occur is not present anymore.

Log
---

$ uname -rv
4.4.0-160-generic #188-Ubuntu SMP Wed Aug 14 04:21:43 UTC 2019

$ modinfo zfs | head
filename:   /lib/modules/4.4.0-160-generic/kernel/zfs/zfs/zfs.ko
version:0.6.5.6-0ubuntu28
...
srcversion: 99F1D0FED2F291CA7AED0C6

$ sudo apt-get install zfsutils-linux attr

$ sudo ./zfs-mount.sh

$ echo 2 | sudo tee /proc/sys/vm/drop_caches
2

$ sudo ./zfs-kprobes.sh

$ sudo cat /sys/kernel/debug/tracing/trace_pipe &

$ touch /zfs/file
   touch-10656 [001] d...   359.615887: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe00 [zfs]) flag=0x0 dzp=0x8800b9875940
   touch-10656 [001] d...   359.616184: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xa
   touch-10656 [001] d...   359.616339: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880036f48440

$ setfattr -n user.debug -v 1 /zfs/file
setfattr-10657 [000] d...   361.507063: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe00 [zfs]) flag=0x2 dzp=0x880036f48440
setfattr-10657 [000] d...   361.507265: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xb
setfattr-10657 [000] d...   361.507402: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880139d09980
setfattr-10657 [000] d...   361.507665: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa
setfattr-10657 [000] d...   361.507792: r_zfs_zget_0: 
(zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget)
setfattr-10657 [000] d...   361.507981: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa
setfattr-10657 [000] d...   361.508104: r_zfs_zget_0: 
(zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget)
setfattr-10657 [000] d...   361.508692: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa
setfattr-10657 [000] d...   361.508821: r_zfs_zget_0: 
(zfs_zaccess+0x12b/0x220 [zfs] <- zfs_zget)
setfattr-10657 [000] d...   361.509022: p_zfs_mknode_0: 
(zfs_mknode+0x0/0xe00 [zfs]) flag=0x0 dzp=0x880139d09980
setfattr-10657 [000] d...   361.509170: p_zfs_znode_alloc_0: 
(zfs_znode_alloc+0x0/0x520 [zfs]) obj=0xc
setfattr-10657 [000] d...   361.509302: r_zfs_znode_alloc_0: 
(zfs_mknode+0x8a3/0xe00 [zfs] <- zfs_znode_alloc) zpp=0x880139d09100

$ rm /zfs/file
  rm-10658 [001] d...   363.216716: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xa
  rm-10658 [001] d...   363.216882: r_zfs_zget_0: 
(zfs_dirent_lock+0x56c/0x6c0 [zfs] <- zfs_zget)
  rm-10658 [001] d...   363.217130: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xb
  rm-10658 [001] d...   363.217271: r_zfs_zget_0: 
(zfs_remove+0x22b/0x4c0 [zfs] <- zfs_zget)
  rm-10658 [001] d...   363.217567: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0x880036f48650
  rm-10658 [001] d...   363.217715: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0x880036f48650
  rm-10658 [001] d...   363.217835: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0x880036f48440 obj=0xa
  rm-10658 [001] d...   363.217963: p_zfs_rmnode_0: 
(zfs_rmnode+0x0/0x360 [zfs]) znode=0x880036f48440
  rm-10658 [001] d...   363.218102: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xb
  rm-10658 [001] d...   363.218232: r_zfs_zget_0: 
(zfs_rmnode+0x25b/0x360 [zfs] <- zfs_zget)
  rm-10658 [001] d...   363.218464: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0x880139d09b90 obj=0x0
   <...>-10308 [003] d...   363.218496: p_zpl_evict_inode_0: 
(zpl_evict_inode+0x0/0x60 [zfs]) inode=0x880139d09b90
  z_iput-10308 [003] d...   363.218503: p_zfs_inactive_0: 
(zfs_inactive+0x0/0x270 [zfs]) inode=0x880139d09b90
  z_iput-10308 [003] d...   363.218505: p_zfs_zinactive_0: 
(zfs_zinactive+0x0/0xe0 [zfs]) znode=0x880139d09980 obj=0xb
  z_iput-10308 [003] d...   363.218509: p_zfs_rmnode_0: 
(zfs_rmnode+0x0/0x360 [zfs]) znode=0x880139d09980
  z_iput-10308 [003] d...   363.218512: p_zfs_purgedir_0: 
(zfs_purgedir+0x0/0x230 [zfs]) znode=0x880139d09980
  z_iput-10308 [003] d...   363.218560: p_zfs_zget_0: 
(zfs_zget+0x0/0x230 [zfs]) zsb=0x8800bb12c000 obj=0xc
  z_iput-10308 [003] d...   363.218566: r_zfs_zget_0: 
(zfs_purgedir+0xb4/0x230 [zfs] <- zfs_zget)
  z_iput-10308 [003] d...   363.218606: p_zfs_iput_async_0: 
(zfs_iput_async+0x0/0x60 [zfs]) inode=0x880139d09310 obj=0x0
  z_iput-10308 [003] d...   363.218626: r_zfs_purgedir_0: 
(zfs_rmnode+

[Kernel-packages] [Bug 1840704] [NEW] ZFS kernel modules lack debug symbols

2019-08-19 Thread Mauricio Faria de Oliveira
Public bug reported:

The ZFS kernel modules aren't built with debug symbols,
which introduces problems/issues for debugging/support.

Patches will be sent soon for linux and zfs/spl-linux,
covering X/B/D/E/Unstable.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
 Status: In Progress

** Changed in: linux (Ubuntu)
   Status: New => In Progress

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches will be sent soon for linux and zfs/spl-linux,
  covering X/B/D/E/Unstable.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1840789] [NEW] bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-20 Thread Mauricio Faria de Oliveira
Public bug reported:

Description/patches to be provided this week.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
 Status: In Progress

** Changed in: linux (Ubuntu)
   Status: New => In Progress

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  In Progress

Bug description:
  Description/patches to be provided this week.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device

2019-08-21 Thread Mauricio Faria de Oliveira
Verification successful with bionic-proposed.
No warning nor oops anymore.

# uname -rv
4.15.0-59-generic #66-Ubuntu SMP Wed Aug 14 10:56:44 UTC 2019

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
[  105.696881] bcache: register_bdev() registered backing device loop0
[  105.703809] bcache: run_cache_set() invalidating existing data
[  105.714280] bcache: register_cache() registered cache device loop1
[  109.677765] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 
3fd195b5-7334-4759-81d9-0faadc042f59
# 

# reboot
# # comment last line in script.

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
[   21.645209] bcache: register_bdev() registered backing device loop0
[   21.697858] bcache: run_cache_set() invalidating existing data
[   21.709142] bcache: register_cache() registered cache device loop1
# sleep 10
# 

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837788

Title:
  bcache kernel warning when attaching device

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Users can get a Warning or even Oops the kernel if
 bcache/writeback_percent is set before attaching a
 caching device to the bcache device.

   * The fix is trivial, upstream, and consists of just
 checking whether the caching device is attached in
 order to set flags and schedule thread (which oops).

  [Test Case]

   * See attachment 'setup-bcache-wb_percent-before-attach.sh'
 used in comment #5 and #6 to reproduce the problem(s).

   * for 'Warning':

 # make-bcache -B 
 # make-bcache -C 
 # echo 11 > /sys/block//bcache/writeback_percent
 # sleep 1
 # echo  > /sys/block//bcache/attach

   * for 'Oops':
 (steps above, but don't run last command / 'attach').

  [Regression Potential]

   * Low. The fix is trivial, contained, and exclusive to bcache sysfs
  handler.

   * The modified path has been exercised with synthetic testing
  (script).

  [Original Bug Description]

  See attached dmesg, each time this server is rebooted it emits a
  concerning bcache warning.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-54-generic 4.15.0-54.58
  ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18
  Uname: Linux 4.15.0-54-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k4.15.0-54-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', 
'/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', 
'/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Date: Wed Jul 24 12:28:06 2019
  InstallationDate: Installed on 2013-10-04 (2119 days ago)
  InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 
(20130925.1)
  MachineType: Supermicro X9DAi
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic 
root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline 
nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-54-generic N/A
   linux-backports-modules-4.15.0-54-generic  N/A
   linux-firmware 1.173.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago)
  dmi.bios.date: 05/09/2015
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3.2
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: X9DAi
  dmi.board.vendor: Supermicro
  dmi.board.version: 0123456789
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rv

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-22 Thread Mauricio Faria de Oliveira
This fix is already present in Eoan and Unstable:

~/git/ubuntu-eoan$ git log --oneline origin/master-next -- 
drivers/net/ethernet/broadcom/bnx2x/ | head | grep cos
1c41d7b7cf60 bnx2x: Disable multi-cos feature.

~/git/ubuntu-eoan$ git describe --contains 1c41d7b7cf60
Ubuntu-5.2.0-12.13~51

~/git/ubuntu-unstable$ git log --oneline origin/master -- 
drivers/net/ethernet/broadcom/bnx2x/ | head | grep cos
d1f0b5dce8fd bnx2x: Disable multi-cos feature.
~/git/ubuntu-unstable$ git describe --contains d1f0b5dce8fd
Ubuntu-5.3.0-4.5~313^2~91


** Description changed:

- Description/patches to be provided this week.
+ [Impact]
+ 
+  * The bnx2x driver may cause hardware faults (leading to
+panic/reboot) and other behaviors as transmit timeouts,
+after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
+introduced.
+ 
+  * This issue has been observed by an user shortly
+after starting docker & kubelet, with adapters:
+- Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
+- Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]
+ 
+  * If options to ignore hardware faults are used
+(erst_disable=1 hest_disable=1 ghes.disable=1)
+the system doesn't panic/reboot and continues
+on to timeout on adapter stats, then transmit
+timeouts, spewing some adapter firmware dumps,
+but the network interface is non-functional.
+ 
+  * The issue only happened when LLDP is enabled
+on the network switches, and crashdump shows
+the bnx2x driver is stuck/waits for firmware
+to complete the stop traffic command in LLDP
+handling. Workaround used is to disable LLDP
+in the network switches/ports.
+ 
+  * Analysis of the driver and firmware dumps
+didn't help significantly towards finding
+the root cause.
+ 
+  * Upstream/mainline recently just reverted the
+patch, due to similar problem reports, while
+looking for the root cause/proper fix.
+ 
+ [Test Case]
+ 
+  * No reproducible test case found outside
+the user's systems/cluster, where it is
+enough to start docker & kubelet & wait.
+ 
+  * The user verified test kernels for Xenial
+and Bionic - the problem does not happen.
+ 
+ [Regression Potential]
+ 
+  * Users who significantly use/apply the non-default
+traffic class (tc) / class of service (cos) might
+possibly see performance changes (if any at all)
+in such applications, however that's unclear now.
+ 
+  * This is a recent revert upstream (v5.3-rc'ish),
+so there's chance things might change in this area.
+ 
+  * Nonetheless, the patch is authored by the driver
+vendor, and made its way into stable kernels
+(e.g., v5.2.8 which made Eoan/19.10 recently).

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
   Status: In Progress

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Eoan)
   Status: In Progress => Fix Released

** Changed in: linux (Ubuntu Disco)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: linux (Ubuntu Disco)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Description changed:

  [Impact]
  
-  * The bnx2x driver may cause hardware faults (leading to
-panic/reboot) and other behaviors as transmit timeouts,
-after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
-introduced.
+  * The bnx2x driver may cause hardware faults (leading to
+    panic/reboot) and other behaviors as transmit timeouts,
+    after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
+    introduced.
  
-  * This issue has been observed by an user shortly
-after starting docker & kubelet, with adapters:
-- Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
-- Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]
+  * This issue has been observed by an user shortly
+    after starting docker & kubelet, with adapters:
+    - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
+    - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]
  
-  * If options to ignore hardware faults are used
-(erst_disable=1 hest_disable=1 ghes.disable=1)
-the system doesn't panic/reboot and continues
-on to timeout on adapter stats, then transmit
-timeouts, spewing some adapter f

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-22 Thread Mauricio Faria de Oliveira
For documentation purposes, in a recent Xenial/4.4 kernel,
this kernel error log is seen (with options to ignore the
hardware error/fault that panics/reboots the system).

[  113.658876] bnx2x: [bnx2x_stats_comp:205(eno1)]timeout waiting for stats 
finished
[  123.648066] bnx2x: [bnx2x_state_wait:310(eno1)]timeout waiting for state 6
[  123.730345] bnx2x: [bnx2x_dcbx_stop_hw_tx:443(eno1)]Unable to hold traffic 
for HW configuration
[  123.834443] bnx2x: [bnx2x_dcbx_stop_hw_tx:444(eno1)]driver assert
[  123.907439] bnx2x: [bnx2x_panic_dump:919(eno1)]begin crash dump 
-
...
[  123.907662] bnx2x :19:00.0 eno1: bc 7.14.11
[  123.907666] begin fw dump (mark 0x3c65c8)
[  123.908033] end of fw dump
[  123.908048] bnx2x: [bnx2x_mc_assert:751(eno1)]Chip Revision: everest3, FW 
Version: 7_12_30
[  123.908049] bnx2x: [bnx2x_panic_dump:1182(eno1)]end crash dump 
-
[  128.701944] bnx2x: [bnx2x_func_state_change:6306(eno1)]timeout waiting for 
previous ramrod completion
[  128.701946] bnx2x: [bnx2x_dcbx_resume_hw_tx:469(eno1)]Unable to resume 
traffic after HW configuration
[  128.701946] bnx2x: [bnx2x_dcbx_resume_hw_tx:470(eno1)]driver assert
[  128.701948] bnx2x: [bnx2x_panic_dump:919(eno1)]begin crash dump 
-
...
[  128.702170] bnx2x :19:00.0 eno1: bc 7.14.11
[  128.702173] begin fw dump (mark 0x3c65c8)
[  128.702542] end of fw dump
[  128.702557] bnx2x: [bnx2x_mc_assert:751(eno1)]Chip Revision: everest3, FW 
Version: 7_12_30
[  128.702558] bnx2x: [bnx2x_panic_dump:1182(eno1)]end crash dump 
-
[  128.702565] bnx2x: [bnx2x_sp_rtnl_task:10229(eno1)]Indicating link is down 
due to Tx-timeout
[  130.704628] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for 
queue[0]: txdata->tx_pkt_prod(4) != txdata->tx_pkt_cons(3)
[  132.706968] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for 
queue[8]: txdata->tx_pkt_prod(445) != txdata->tx_pkt_cons(443)
[  134.710090] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for 
queue[16]: txdata->tx_pkt_prod(29) != txdata->tx_pkt_cons(25)
...
[  202.648543] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for 
queue[7]: txdata->tx_pkt_prod(25) != txdata->tx_pkt_cons(24)
[  204.792441] bnx2x: [bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for 
queue[23]: txdata->tx_pkt_prod(51) != txdata->tx_pkt_cons(46)
[  204.940151] bnx2x: [bnx2x_del_all_macs:8499(eno1)]Failed to delete MACs: -5
[  205.023453] bnx2x: [bnx2x_chip_cleanup:9319(eno1)]Failed to schedule DEL 
commands for UC MACs list: -5
[  206.351810] bnx2x: [bnx2x_func_stop:9078(eno1)]FUNC_STOP ramrod failed. 
Running a dry transaction
[  206.778590] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout!
[  206.856735] bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE returned failure -1
[  207.134674] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout!
[  207.212785] bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE returned failure -1
[  207.490725] bnx2x: [bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout!
...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * The bnx2x driver may cause hardware faults (leading to
     panic/reboot) and other behaviors as transmit timeouts,
     after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
     introduced.

   * This issue has been observed by an user shortly
     after starting docker & kubelet, with adapters:
     - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
     - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]

   * If options to ignore hardware faults are used
     (erst_disable=1 hest_disable=1 ghes.disable=1)
     the system doesn't panic/reboot and continues
     on to timeout on adapter stats, then transmit
     timeouts, spewing some adapter firmware dumps,
     but the network interface is non-functional.

   * The issue only happened when LLDP is enabled
     on the network switches, and crashdump shows
     the bnx2x driver is stuck/waits for firmware
     to complete the stop traffic command in LLDP
     handling. Workaround used is to disable LLDP
     in the network switches/ports.

   * Analysis of the driver and firmware dumps
     didn't help significantly towards finding
     the root cause.

   * Upstream/mainline recently just reverted the
     patch, due to similar problem reports, while
     looking for the root cause/proper fix.

  [Test Case]

   * No reproducible test case found outside
     the user's systems/cluster, where it is
     enough to

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-22 Thread Mauricio Faria de Oliveira
Somewhat similarly on recent 5.2 kernel without the fix.
(again with options to ignore hardware errors/faults)

Aug 19 17:15:15 HOSTNAME kernel: Uhhuh. NMI received for unknown reason 21 on 
CPU 0.
Aug 19 17:15:15 HOSTNAME kernel: perf interrupt took too long (3222 > 2500), 
lowering kernel.perf_event_max_sample_rate to 5
Aug 19 17:15:15 HOSTNAME kernel: TCP: request_sock_TCP: Possible SYN flooding 
on port 9300. Sending cookies.  Check SNMP counters.
Aug 19 17:15:15 HOSTNAME kernel: Do you have a strange power saving mode 
enabled?
Aug 19 17:15:15 HOSTNAME kernel: Dazed and confused, but trying to continue
...
Aug 19 17:15:21 HOSTNAME kernel: NETDEV WATCHDOG: eno1 (bnx2x): transmit queue 
0 timed out
...
Aug 19 17:15:21 HOSTNAME kernel: bnx2x: 
[bnx2x_sp_rtnl_task:10229(eno1)]Indicating link is down due to Tx-timeout
Aug 19 17:15:21 HOSTNAME kernel: bond0: link status down for interface eno1, 
disabling it in 200 ms
Aug 19 17:15:21 HOSTNAME kernel: bnx2x: [bnx2x_stats_comp:205(eno1)]timeout 
waiting for stats finished
Aug 19 17:15:21 HOSTNAME kernel: bnx2x: [bnx2x_stats_comp:205(eno1)]timeout 
waiting for stats finished
Aug 19 17:15:23 HOSTNAME kernel: bnx2x: 
[bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[0]: 
txdata->tx_pkt_prod(4) != txdata->tx_pkt_cons(2)
Aug 19 17:15:25 HOSTNAME kernel: bnx2x: 
[bnx2x_clean_tx_queue:1204(eno1)]timeout waiting for queue[8]: 
txdata->tx_pkt_prod(1) != txdata->tx_pkt_cons(0)
...
Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout 
waiting for state 0
Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_del_all_macs:8499(eno1)]Failed 
to delete MACs: -16
Aug 19 17:17:14 HOSTNAME kernel: bnx2x: [bnx2x_chip_cleanup:9319(eno1)]Failed 
to schedule DEL commands for UC MACs list: -16
Aug 19 17:17:24 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout 
waiting for state 9
Aug 19 17:17:34 HOSTNAME kernel: bnx2x: [bnx2x_state_wait:310(eno1)]timeout 
waiting for state 2
Aug 19 17:17:34 HOSTNAME kernel: bnx2x: [bnx2x_func_stop:9078(eno1)]FUNC_STOP 
ramrod failed. Running a dry transaction
Aug 19 17:17:35 HOSTNAME kernel: bnx2x: 
[bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout!
Aug 19 17:17:35 HOSTNAME kernel: bnx2x: [bnx2x_write_dmae:598(eno1)]DMAE 
returned failure -1
Aug 19 17:17:35 HOSTNAME kernel: bnx2x: 
[bnx2x_issue_dmae_with_comp:550(eno1)]DMAE timeout!
...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * The bnx2x driver may cause hardware faults (leading to
     panic/reboot) and other behaviors as transmit timeouts,
     after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
     introduced.

   * This issue has been observed by an user shortly
     after starting docker & kubelet, with adapters:
     - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
     - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]

   * If options to ignore hardware faults are used
     (erst_disable=1 hest_disable=1 ghes.disable=1)
     the system doesn't panic/reboot and continues
     on to timeout on adapter stats, then transmit
     timeouts, spewing some adapter firmware dumps,
     but the network interface is non-functional.

   * The issue only happened when LLDP is enabled
     on the network switches, and crashdump shows
     the bnx2x driver is stuck/waits for firmware
     to complete the stop traffic command in LLDP
     handling. Workaround used is to disable LLDP
     in the network switches/ports.

   * Analysis of the driver and firmware dumps
     didn't help significantly towards finding
     the root cause.

   * Upstream/mainline recently just reverted the
     patch, due to similar problem reports, while
     looking for the root cause/proper fix.

  [Test Case]

   * No reproducible test case found outside
     the user's systems/cluster, where it is
     enough to start docker & kubelet & wait.

   * The user verified test kernels for Xenial
     and Bionic - the problem does not happen;
 build-tested on Disco.

  [Regression Potential]

   * Users who significantly use/apply the non-default
     traffic class (tc) / class of service (cos) might
     possibly see performance changes (if any at all)
     in such applications, however that's unclear now.

   * This is a recent revert upstream (v5.3-rc'ish),
     so there's chance things might change in this area.

   * Nonetheless, the patch is authored by the driver
     vendor, and made its way into stable kernels
     (e.g., v5.2.8 which made Eoan/19.10

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-22 Thread Mauricio Faria de Oliveira
Older crashdump analysis confirmed the bnx2x driver/status
being in traffic class setup / stop hardware in LLDP path.

PID: 3936   TASK: 883fdc9b1c00  CPU: 11  COMMAND: "kworker/11:0"
 #0 [883fec593ce0] __schedule at 81850bae
 #1 [883fec593d30] schedule at 818510f5
 #2 [883fec593d48] schedule_preempt_disabled at 8185139e
 #3 [883fec593d58] __mutex_lock_slowpath at 81852fd9
 #4 [883fec593db0] mutex_lock at 8185306f
 #5 [883fec593dc8] rtnl_lock at 81756e15
 #6 [883fec593dd8] bnx2x_sp_rtnl_task at c025d8c4 [bnx2x] 
 #7 [883fec593e20] process_one_work at 8109e68b
 #8 [883fec593e60] worker_thread at 8109e9fb
 #9 [883fec593ec0] kthread at 810a4dc7
#10 [883fec593f50] ret_from_fork at 81855735

Check this stack frame:

 #6 [883fec593dd8] bnx2x_sp_rtnl_task at c025d8c4 [bnx2x]

Which is 9 x 8-byte/64-bit values long:

 #7 [883fec593e20]

883fec593e20 - 883fec593dd8 = 0x48 bytes = 72 bytes = 9 x 8
bytes.

crash> rd 883fec593dd8 9
883fec593dd8:  c025d8c4 883feaa0a178   ..%.x...?...
883fec593de8:  6199482b89f76272 883fe9571080   rb..+H.a..W.?...
883fec593df8:  883ffdf56b40 883ffdf5b400   @k..?...?...
883fec593e08:  02c0 881fe93f0dd8   ..?.
883fec593e18:  883fec593e58X>Y.?...

The top of the stack has the RIP/next-instruction contents,
which matches what's in the stack frame line.

c025d8c4

Looking at the disassembly, it's right after the 'callq rtnl_lock', as
expected.


static void bnx2x_sp_rtnl_task(struct work_struct *work)
{

rdi = work

0xc025d890 :nopl   0x0(%rax,%rax,1) [FTRACE 
NOP]
0xc025d895 :  push   %rbp
0xc025d896 :  mov%rsp,%rbp
0xc025d899 :  push   %r15
0xc025d89b : push   %r14
0xc025d89d : push   %r13
0xc025d89f : push   %r12

0xc025d8a1 : lea-0x598(%rdi),%r12

^
struct bnx2x *bp = container_of(work, struct bnx2x, 
sp_rtnl_task.work);

r12 = bp

0xc025d8a8 : push   %rbx
0xc025d8a9 : mov%rdi,%rbx

rbx = rdi = work




work = rbx = 0x881fe93f0dd8

crash> struct work_struct 881fe93f0dd8
struct work_struct {
  data = {
counter = 704
  }, 
  entry = {
next = 0x881fe93f0de0, 
prev = 0x881fe93f0de0
  }, 
  func = 0xc025d890 
}

bp = 0x881fe93f0840 (offset in asm above)

crash> eval 0x881fe93f0dd8 - 0x598
hexadecimal: 881fe93f0840  
decimal: 18446612269371426880  (-131804338124736)
  octal: 174201775117604100
 binary: 
10001000100100111100

crash> struct bnx2x 881fe93f0840
struct bnx2x {
  fp = 0x881fe95c4000, 
  sp_objs = 0x881fe9fb, 
  fp_stats = 0x881fe935c000, 
  bnx2x_txq = 0x881fe87ef000, 
  regview = 0xc9001d00, 
  doorbells = 0xc90019878000, 
...
  dev = 0x881fe93f, 
  pdev = 0x881fef03b000, 
  iro_arr = 0x881ff0756000, 
  recovery_state = BNX2X_RECOVERY_DONE, 
...
  cnic_support = 1 '\001', 
  cnic_enabled = false, 
  cnic_loaded = false, 
  cnic_probe = 0xc0243280 , 
  fcoe_init = false, 
...
  sp_task = {
...
  func = 0xc0251960 
...
  sp_rtnl_task = {
work = {
  data = {
counter = 704
  }, 
  entry = {
next = 0x881fe93f0de0, 
prev = 0x881fe93f0de0
  }, 
  func = 0xc025d890 
}, 
...
  fw_ver = "FFV14.04.18 
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
...
  dcb_state = 1, 
  dcbx_enabled = 2, 
  dcbx_mode_uset = false, 
  dcbx_config_params = {
overwrite_settings = 1, 
admin_dcbx_version = 0, 
admin_ets_enable = 1, 
admin_pfc_enable = 1, 
admin_tc_supported_tx_enable = 1, 
admin_ets_configuration_tx_enable = 1, 
admin_ets_recommendation_tx_enable = 0, 
admin_pfc_tx_enable = 1, 
admin_application_priority_tx_enable = 1, 
admin_ets_willing = 1, 
admin_ets_reco_valid = 1, 
admin_pfc_willing = 1, 
admin_app_priority_willing = 1, 
admin_configuration_bw_precentage = {100, 0, 0, 0, 0, 0, 0, 0}, 
admin_configurati

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-22 Thread Mauricio Faria de Oliveira
[X/B][PATCH] bnx2x: Disable multi-cos feature.
https://lists.ubuntu.com/archives/kernel-team/2019-August/103282.html

[D][PATCH] bnx2x: Disable multi-cos feature.
https://lists.ubuntu.com/archives/kernel-team/2019-August/103283.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * The bnx2x driver may cause hardware faults (leading to
     panic/reboot) and other behaviors as transmit timeouts,
     after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
     introduced.

   * This issue has been observed by an user shortly
     after starting docker & kubelet, with adapters:
     - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
     - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]

   * If options to ignore hardware faults are used
     (erst_disable=1 hest_disable=1 ghes.disable=1)
     the system doesn't panic/reboot and continues
     on to timeout on adapter stats, then transmit
     timeouts, spewing some adapter firmware dumps,
     but the network interface is non-functional.

   * The issue only happened when LLDP is enabled
     on the network switches, and crashdump shows
     the bnx2x driver is stuck/waits for firmware
     to complete the stop traffic command in LLDP
     handling. Workaround used is to disable LLDP
     in the network switches/ports.

   * Analysis of the driver and firmware dumps
     didn't help significantly towards finding
     the root cause.

   * Upstream/mainline recently just reverted the
     patch, due to similar problem reports, while
     looking for the root cause/proper fix.

  [Test Case]

   * No reproducible test case found outside
     the user's systems/cluster, where it is
     enough to start docker & kubelet & wait.

   * The user verified test kernels for Xenial
     and Bionic - the problem does not happen;
 build-tested on Disco.

  [Regression Potential]

   * Users who significantly use/apply the non-default
     traffic class (tc) / class of service (cos) might
     possibly see performance changes (if any at all)
     in such applications, however that's unclear now.

   * This is a recent revert upstream (v5.3-rc'ish),
     so there's chance things might change in this area.

   * Nonetheless, the patch is authored by the driver
     vendor, and made its way into stable kernels
     (e.g., v5.2.8 which made Eoan/19.10 recently).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io

2019-08-26 Thread Mauricio Faria de Oliveira
Hi Drew,

There's a mpt3sas fix in v5.3-rc3 for a problem that may cause an adapter 
firmware fault
(although not sure of the exact fault state code; but it should cause a reset 
anyway).

If you could please test either
1) v5.3-rc2 [1] to confirm the issue happens with v5.3-rc2 but not with 
v5.3-rc3;
or
2) or 4.15.0-60.67 (in bionic-proposed) which has the fix (so checking whether 
issue doesn't happen)
that would be great.

If that doesn't help, please continue with the great regression tip
provided by Kai-Heng Feng.

Thanks!
Mauricio

[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc2/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841132

Title:
  mpt3sas - storage controller resets under heavy disk io

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [summary]
  when a server running ubuntu 18.04 with an lsi sas controller experiences 
high disk io there is a chance the storage controller will reset
  this can take weeks or months, but once the controller resets it will keep 
resetting every few seconds or few minutes, dramatically degrading disk io
  the server must be rebooted to restore the controller to a normal state

  [hardware configuration]
  server: dell poweredge r7415, purchased 2019-02
  cpu/chipset: amd epyc naples
  storage controller: "dell hba330 mini" with chipset "lsi sas3008"
  drives: 4x samsung 860 pro 2TB ssd

  [software configuration]
  ubuntu 18.04 server
  mdadm raid6
  all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q)

  [what happened]
  server was operating as a vm host for months without issue
  one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag 
reset !!" and "Power-on or device reset occurred", along with unusably-slow 
disk io
  the server was removed from production and I looked for a way to reproduce 
the issue

  [how to reproduce the issue]
  there are probably many ways to product this issue, the hackish way I found 
to reliably reproduce it was:
  have the four ssds in a mdadm raid6 with ext4 filesystem
  create three 500GB files containing random data
  open three terminals. one calculates md5sum of file1 in a loop, another does 
the same for file2, the third does a copy of file3 to file3-temp in a loop
  the number of files is arbitrary, the goal is just to generate a lot of disk 
io on files too large to be cached in memory
  then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause 
even more drive thrashing
  within 1-15min the controller will enter the broken state. the longest I ever 
saw it take was 30min. I reproduced this several times
  rebooting the server restores the controller to a normal state
  if the server is not rebooted and the controller is left in this broken state 
eventually drives will fall out of the array, and sometimes array/filesystem 
corruption will occur

  [why this is being reported here]
  It's unlikely I am exceeding limits of the hardware since this server chassis 
can hold 24 drives and I am only using 4. The controller specs indicate I 
should not hit pcie bandwidth limits until at least 16 drives.
  My first thought was that the lsi controller firmware was at fault since they 
have been historically buggy, however I reproduced this with the newest 
firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be 
dell-specific).
  I then tried the most recent motherboard bios "1.9.3", and downgraded to 
"1.9.2", no change.
  I then wanted to eliminate the possibility of a bad drive. swapped out all 4 
drives with different ones of the same model, no change.
  I then upgraded from the standard 18.04 kernel to the newer backported hwe 
kernel, which also came with a newer mpt3sas driver, no change.
  I then ran the same test on the same array but with rhel 8, to my surprise I 
could no longer reproduce the issue.
  -
  tl;dr version:
  ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller 
breaks in 1-10min
  ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage 
controller breaks in 1-15min, max 30min
  rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same 
array for 19h, no errors

  [caveats]
  Server os misconfiguration is possible, however this is a rather basic vm 
host running kvm and no 3rd-party packages.
  I can't conclusively prove this isn't a hardware fault since I don't have a 
second unused identical server to test on right now, however the fact that the 
problem can be easily reproduced under ubuntu but not under rhel seems 
noteworthy.
  There is another bug (LP: #1810781) similar to this, I didn't post there 
because it's already marked as fixed.
  There is also a debian bug (Debian #926202) that encountered this on kernel 
4.19.0, but I'm unable to tell if it's the same issue.

To manage notifications about this bug go 

[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io

2019-08-26 Thread Mauricio Faria de Oliveira
Mentioned upstream candidate fix is:

commit df9a606184bfdb5ae3ca9d226184e9489f5c24f7
Author: Suganath Prabu 
Date:   Tue Jul 30 03:43:57 2019 -0400

scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA

Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
per hardware design, if DMA-able range contains all 64-bits
set (0x-) then it results in a firmware fault.

E.g. SGE's start address is 0x-000 and data length is 0x1000
bytes. when HBA tries to DMA the data at 0x- location then
HBA will fault the firmware.

Driver will set 63-bit DMA mask to ensure the above address will not be
used.

Cc:  # 5.1.20+
Signed-off-by: Suganath Prabu 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 

git/linux $ git describe --contains df9a606184bfdb5ae3ca9d226184e9489f5c24f7
v5.3-rc3~21^2~1

git/ubuntu-bionic $ git log --oneline Ubuntu-4.15.0-60.67 -- 
drivers/scsi/mpt3sas/ 
395f1e3037b8 scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841132

Title:
  mpt3sas - storage controller resets under heavy disk io

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [summary]
  when a server running ubuntu 18.04 with an lsi sas controller experiences 
high disk io there is a chance the storage controller will reset
  this can take weeks or months, but once the controller resets it will keep 
resetting every few seconds or few minutes, dramatically degrading disk io
  the server must be rebooted to restore the controller to a normal state

  [hardware configuration]
  server: dell poweredge r7415, purchased 2019-02
  cpu/chipset: amd epyc naples
  storage controller: "dell hba330 mini" with chipset "lsi sas3008"
  drives: 4x samsung 860 pro 2TB ssd

  [software configuration]
  ubuntu 18.04 server
  mdadm raid6
  all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q)

  [what happened]
  server was operating as a vm host for months without issue
  one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag 
reset !!" and "Power-on or device reset occurred", along with unusably-slow 
disk io
  the server was removed from production and I looked for a way to reproduce 
the issue

  [how to reproduce the issue]
  there are probably many ways to product this issue, the hackish way I found 
to reliably reproduce it was:
  have the four ssds in a mdadm raid6 with ext4 filesystem
  create three 500GB files containing random data
  open three terminals. one calculates md5sum of file1 in a loop, another does 
the same for file2, the third does a copy of file3 to file3-temp in a loop
  the number of files is arbitrary, the goal is just to generate a lot of disk 
io on files too large to be cached in memory
  then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause 
even more drive thrashing
  within 1-15min the controller will enter the broken state. the longest I ever 
saw it take was 30min. I reproduced this several times
  rebooting the server restores the controller to a normal state
  if the server is not rebooted and the controller is left in this broken state 
eventually drives will fall out of the array, and sometimes array/filesystem 
corruption will occur

  [why this is being reported here]
  It's unlikely I am exceeding limits of the hardware since this server chassis 
can hold 24 drives and I am only using 4. The controller specs indicate I 
should not hit pcie bandwidth limits until at least 16 drives.
  My first thought was that the lsi controller firmware was at fault since they 
have been historically buggy, however I reproduced this with the newest 
firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be 
dell-specific).
  I then tried the most recent motherboard bios "1.9.3", and downgraded to 
"1.9.2", no change.
  I then wanted to eliminate the possibility of a bad drive. swapped out all 4 
drives with different ones of the same model, no change.
  I then upgraded from the standard 18.04 kernel to the newer backported hwe 
kernel, which also came with a newer mpt3sas driver, no change.
  I then ran the same test on the same array but with rhel 8, to my surprise I 
could no longer reproduce the issue.
  -
  tl;dr version:
  ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller 
breaks in 1-10min
  ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage 
controller breaks in 1-15min, max 30min
  rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same 
array for 19h, no errors

  [caveats]
  Server os misconfiguration is possible, however this is a rather basic vm 
host running kvm and no 3rd-party packages.
  I can't conclusively prove this isn't a hard

[Kernel-packages] [Bug 1841148] Re: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver

2019-08-26 Thread Mauricio Faria de Oliveira
Hi Martin,

There's a potential fix for this upstream, in v5.3-rc1 mainline build (thus not 
in v5.2),
which is also applied to bionic-proposed (4.15.0-60.67).

Could you please test whether the issue is resolved with bionic-proposed
[1] ?

If that doesn't help, further regression/bisect test steps will be
needed.

Thank you!

[1] https://wiki.ubuntu.com/Testing/EnableProposed

The mentioned potential fix is:

commit d17ba0f616a08f597d9348c372d89b8c0405ccf3
Author: Konstantin Khlebnikov 
Date:   Wed Apr 17 11:13:20 2019 +0300

e1000e: start network tx queue only when link is up

Driver does not want to keep packets in Tx queue when link is lost.
But present code only reset NIC to flush them, but does not prevent
queuing new packets. Moreover reset sequence itself could generate
new packets via netconsole and NIC falls into endless reset loop.

This patch wakes Tx queue only when NIC is ready to send packets.

This is proper fix for problem addressed by commit 0f9e980bf5ee
("e1000e: fix cyclic resets at link up with active tx").

Signed-off-by: Konstantin Khlebnikov 
Suggested-by: Alexander Duyck 
Tested-by: Joseph Yasi 
Tested-by: Aaron Brown 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Jeff Kirsher 

git/linux $ git describe --contains d17ba0f616a08f597d9348c372d89b8c0405ccf3
v5.3-rc1~140^2~410^2~2

git/ubuntu-bionic $ git log --oneline origin/master-next -- 
drivers/net/ethernet/intel/e1000e/
02f5b7ea8c79 e1000e: start network tx queue only when link is up
...

git/ubuntu-bionic $ git describe --contains 02f5b7ea8c79
Ubuntu-4.15.0-59.66~576

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841148

Title:
  Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and
  82579V using e1000e driver

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Since linux-image-4.15.0-58-generic my ethernet connection fails to
  get a connection.

  The network connection constantly goes up and down. The issue has been
  reported by another user:

  https://bugzilla.kernel.org/show_bug.cgi?id=204591

  Snippet from kern.log showing that the connection constantly goes up
  and down:

  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134651] e1000e: enp0s31f6 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134830] /dev/vmnet: open 
called by PID 5847 (vmnet-bridge)
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134836] /dev/vmnet: hub 0 
does not exist, allocating memory.
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134858] /dev/vmnet: port 
on hub 0 successfully opened
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134868] bridge-enp0s31f6: 
up
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134872] bridge-enp0s31f6: 
attached
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334794] userif-2: sent 
link down event.
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334801] userif-2: sent 
link up event.
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.156471] bridge-enp0s31f6: 
disabling the bridge on dev down
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158580] bridge-enp0s31f6: 
down
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158599] bridge-enp0s31f6: 
detached
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356790] userif-2: sent 
link down event.
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356795] userif-2: sent 
link up event.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295365] e1000e: enp0s31f6 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295729] /dev/vmnet: open 
called by PID 5847 (vmnet-bridge)
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295741] /dev/vmnet: hub 0 
does not exist, allocating memory.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295785] /dev/vmnet: port 
on hub 0 successfully opened
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295804] bridge-enp0s31f6: 
up
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295810] bridge-enp0s31f6: 
attached
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495615] userif-2: sent 
link down event.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495620] userif-2: sent 
link up event.
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316505] bridge-enp0s31f6: 
disabling the bridge on dev down
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316593] bridge-enp0s31f6: 
down
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316607] bridge-enp0s31f6: 
detached
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516761] userif-2: sent 
link down event.
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516767] userif-2: sent 
link up event.
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441

[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io

2019-08-27 Thread Mauricio Faria de Oliveira
Hi Drew,

That's very good news!  So it looks like that patch resolves the
problem.

Could you please test the kernel in bionic-proposed [1] (4.15.0-60-generic)
which has that patch to confirm it's also working correctly?

Thanks!
Mauricio

[1] https://wiki.ubuntu.com/Testing/EnableProposed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841132

Title:
  mpt3sas - storage controller resets under heavy disk io

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [summary]
  when a server running ubuntu 18.04 with an lsi sas controller experiences 
high disk io there is a chance the storage controller will reset
  this can take weeks or months, but once the controller resets it will keep 
resetting every few seconds or few minutes, dramatically degrading disk io
  the server must be rebooted to restore the controller to a normal state

  [hardware configuration]
  server: dell poweredge r7415, purchased 2019-02
  cpu/chipset: amd epyc naples
  storage controller: "dell hba330 mini" with chipset "lsi sas3008"
  drives: 4x samsung 860 pro 2TB ssd

  [software configuration]
  ubuntu 18.04 server
  mdadm raid6
  all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q)

  [what happened]
  server was operating as a vm host for months without issue
  one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag 
reset !!" and "Power-on or device reset occurred", along with unusably-slow 
disk io
  the server was removed from production and I looked for a way to reproduce 
the issue

  [how to reproduce the issue]
  there are probably many ways to product this issue, the hackish way I found 
to reliably reproduce it was:
  have the four ssds in a mdadm raid6 with ext4 filesystem
  create three 500GB files containing random data
  open three terminals. one calculates md5sum of file1 in a loop, another does 
the same for file2, the third does a copy of file3 to file3-temp in a loop
  the number of files is arbitrary, the goal is just to generate a lot of disk 
io on files too large to be cached in memory
  then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause 
even more drive thrashing
  within 1-15min the controller will enter the broken state. the longest I ever 
saw it take was 30min. I reproduced this several times
  rebooting the server restores the controller to a normal state
  if the server is not rebooted and the controller is left in this broken state 
eventually drives will fall out of the array, and sometimes array/filesystem 
corruption will occur

  [why this is being reported here]
  It's unlikely I am exceeding limits of the hardware since this server chassis 
can hold 24 drives and I am only using 4. The controller specs indicate I 
should not hit pcie bandwidth limits until at least 16 drives.
  My first thought was that the lsi controller firmware was at fault since they 
have been historically buggy, however I reproduced this with the newest 
firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be 
dell-specific).
  I then tried the most recent motherboard bios "1.9.3", and downgraded to 
"1.9.2", no change.
  I then wanted to eliminate the possibility of a bad drive. swapped out all 4 
drives with different ones of the same model, no change.
  I then upgraded from the standard 18.04 kernel to the newer backported hwe 
kernel, which also came with a newer mpt3sas driver, no change.
  I then ran the same test on the same array but with rhel 8, to my surprise I 
could no longer reproduce the issue.
  -
  tl;dr version:
  ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller 
breaks in 1-10min
  ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage 
controller breaks in 1-15min, max 30min
  rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same 
array for 19h, no errors

  [caveats]
  Server os misconfiguration is possible, however this is a rather basic vm 
host running kvm and no 3rd-party packages.
  I can't conclusively prove this isn't a hardware fault since I don't have a 
second unused identical server to test on right now, however the fact that the 
problem can be easily reproduced under ubuntu but not under rhel seems 
noteworthy.
  There is another bug (LP: #1810781) similar to this, I didn't post there 
because it's already marked as fixed.
  There is also a debian bug (Debian #926202) that encountered this on kernel 
4.19.0, but I'm unable to tell if it's the same issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1841132/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io

2019-08-28 Thread Mauricio Faria de Oliveira
Drew,

Thanks for testing bionic-proposed!
So it will be resolved for bionic kernels shortly, when it hit bionic-updates.

Disco/19.04 will get this patch via stable updates in the near future
[1].

Eoan has it applied (LP: #1839588).

So this is all good.

Thanks again,
Mauricio

[1] https://lists.ubuntu.com/archives/kernel-
team/2019-August/103416.html

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: Incomplete

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841132

Title:
  mpt3sas - storage controller resets under heavy disk io

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  New
Status in linux source package in Disco:
  New
Status in linux source package in Eoan:
  Incomplete

Bug description:
  [summary]
  when a server running ubuntu 18.04 with an lsi sas controller experiences 
high disk io there is a chance the storage controller will reset
  this can take weeks or months, but once the controller resets it will keep 
resetting every few seconds or few minutes, dramatically degrading disk io
  the server must be rebooted to restore the controller to a normal state

  [hardware configuration]
  server: dell poweredge r7415, purchased 2019-02
  cpu/chipset: amd epyc naples
  storage controller: "dell hba330 mini" with chipset "lsi sas3008"
  drives: 4x samsung 860 pro 2TB ssd

  [software configuration]
  ubuntu 18.04 server
  mdadm raid6
  all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q)

  [what happened]
  server was operating as a vm host for months without issue
  one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag 
reset !!" and "Power-on or device reset occurred", along with unusably-slow 
disk io
  the server was removed from production and I looked for a way to reproduce 
the issue

  [how to reproduce the issue]
  there are probably many ways to product this issue, the hackish way I found 
to reliably reproduce it was:
  have the four ssds in a mdadm raid6 with ext4 filesystem
  create three 500GB files containing random data
  open three terminals. one calculates md5sum of file1 in a loop, another does 
the same for file2, the third does a copy of file3 to file3-temp in a loop
  the number of files is arbitrary, the goal is just to generate a lot of disk 
io on files too large to be cached in memory
  then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause 
even more drive thrashing
  within 1-15min the controller will enter the broken state. the longest I ever 
saw it take was 30min. I reproduced this several times
  rebooting the server restores the controller to a normal state
  if the server is not rebooted and the controller is left in this broken state 
eventually drives will fall out of the array, and sometimes array/filesystem 
corruption will occur

  [why this is being reported here]
  It's unlikely I am exceeding limits of the hardware since this server chassis 
can hold 24 drives and I am only using 4. The controller specs indicate I 
should not hit pcie bandwidth limits until at least 16 drives.
  My first thought was that the lsi controller firmware was at fault since they 
have been historically buggy, however I reproduced this with the newest 
firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be 
dell-specific).
  I then tried the most recent motherboard bios "1.9.3", and downgraded to 
"1.9.2", no change.
  I then wanted to eliminate the possibility of a bad drive. swapped out all 4 
drives with different ones of the same model, no change.
  I then upgraded from the standard 18.04 kernel to the newer backported hwe 
kernel, which also came with a newer mpt3sas driver, no change.
  I then ran the same test on the same array but with rhel 8, to my surprise I 
could no longer reproduce the issue.
  -
  tl;dr version:
  ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller 
breaks in 1-10min
  ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage 
controller breaks in 1-15min, max 30min
  rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same 
array for 19h, no errors

  [caveats]
  Server os misconfiguration is possible, however this is a rather basic vm 
host running kvm and no 3rd-party packages.
  I can't conclusively prove this isn't a hardware fault since I don't have a 
second unused identical server to test on right now, however the fact that the 
problem can be easily reproduced under ubuntu but not under rhel seems 
noteworthy.
  There is another bug (LP: #1810781) similar to this, I didn't post there 
because it's already marked as fixed.
  There is also a debi

[Kernel-packages] [Bug 1841132] Re: mpt3sas - storage controller resets under heavy disk io

2019-08-28 Thread Mauricio Faria de Oliveira
** Changed in: linux (Ubuntu Eoan)
   Status: Incomplete => Fix Released

** Changed in: linux (Ubuntu Disco)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841132

Title:
  mpt3sas - storage controller resets under heavy disk io

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [summary]
  when a server running ubuntu 18.04 with an lsi sas controller experiences 
high disk io there is a chance the storage controller will reset
  this can take weeks or months, but once the controller resets it will keep 
resetting every few seconds or few minutes, dramatically degrading disk io
  the server must be rebooted to restore the controller to a normal state

  [hardware configuration]
  server: dell poweredge r7415, purchased 2019-02
  cpu/chipset: amd epyc naples
  storage controller: "dell hba330 mini" with chipset "lsi sas3008"
  drives: 4x samsung 860 pro 2TB ssd

  [software configuration]
  ubuntu 18.04 server
  mdadm raid6
  all firmware is fully updated (bios 1.9.3) (hba330 16.17.00.03) (ssd rvm01b6q)

  [what happened]
  server was operating as a vm host for months without issue
  one day the syslog was flooded with messages like "mpt3sas_cm0: sending diag 
reset !!" and "Power-on or device reset occurred", along with unusably-slow 
disk io
  the server was removed from production and I looked for a way to reproduce 
the issue

  [how to reproduce the issue]
  there are probably many ways to product this issue, the hackish way I found 
to reliably reproduce it was:
  have the four ssds in a mdadm raid6 with ext4 filesystem
  create three 500GB files containing random data
  open three terminals. one calculates md5sum of file1 in a loop, another does 
the same for file2, the third does a copy of file3 to file3-temp in a loop
  the number of files is arbitrary, the goal is just to generate a lot of disk 
io on files too large to be cached in memory
  then initiate an array check with "/usr/share/mdadm/checkarray -a" to cause 
even more drive thrashing
  within 1-15min the controller will enter the broken state. the longest I ever 
saw it take was 30min. I reproduced this several times
  rebooting the server restores the controller to a normal state
  if the server is not rebooted and the controller is left in this broken state 
eventually drives will fall out of the array, and sometimes array/filesystem 
corruption will occur

  [why this is being reported here]
  It's unlikely I am exceeding limits of the hardware since this server chassis 
can hold 24 drives and I am only using 4. The controller specs indicate I 
should not hit pcie bandwidth limits until at least 16 drives.
  My first thought was that the lsi controller firmware was at fault since they 
have been historically buggy, however I reproduced this with the newest 
firmware "16.17.00.03" and the previous version "15.17.09.06" (versions may be 
dell-specific).
  I then tried the most recent motherboard bios "1.9.3", and downgraded to 
"1.9.2", no change.
  I then wanted to eliminate the possibility of a bad drive. swapped out all 4 
drives with different ones of the same model, no change.
  I then upgraded from the standard 18.04 kernel to the newer backported hwe 
kernel, which also came with a newer mpt3sas driver, no change.
  I then ran the same test on the same array but with rhel 8, to my surprise I 
could no longer reproduce the issue.
  -
  tl;dr version:
  ubuntu 18.04 (kernel 4.15.0) (mpt3sas driver 17.100.00.00) storage controller 
breaks in 1-10min
  ubuntu 18.04 hwe (kernel 5.0.0) (mpt3sas driver 27.101.00.00) storage 
controller breaks in 1-15min, max 30min
  rhel 8 (kernel 4.18.0) (mpt3sas driver 27.101.00.00) same stress test on same 
array for 19h, no errors

  [caveats]
  Server os misconfiguration is possible, however this is a rather basic vm 
host running kvm and no 3rd-party packages.
  I can't conclusively prove this isn't a hardware fault since I don't have a 
second unused identical server to test on right now, however the fact that the 
problem can be easily reproduced under ubuntu but not under rhel seems 
noteworthy.
  There is another bug (LP: #1810781) similar to this, I didn't post there 
because it's already marked as fixed.
  There is also a debian bug (Debian #926202) that encountered this on kernel 
4.19.0, but I'm unable to tell if it's the same issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1841132/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://laun

[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
** Description changed:

  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.
  
- Patches will be sent soon for linux and zfs/spl-linux,
- covering X/B/D/E/Unstable.
+ Patches are required in:
+ 1) linux kernel packaging, to add infrastructure to
+enable/build/strip/package debug symbols on DKMS.
+(this is sufficient on Eoan's zfs-linux.)
+ 2) zfs-linux and spl-linux, for the stable releases,
+which need a few patches to enable debug symbols.
+ 
+ Initially submitting the kernel patchset for Unstable,
+ for review/feedback.  It backports nicely into B/D/E,
+ should it be accepted; for X (doesn't use DKMS builds)
+ a simpler patch for the moment (until it does) works.
+ 
+ The zfs/spl-linux patches are ready, to be submitted
+ once the approach used by the kernel package settles.

** Description changed:

  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.
  
  Patches are required in:
+ 
  1) linux kernel packaging, to add infrastructure to
-enable/build/strip/package debug symbols on DKMS.
-(this is sufficient on Eoan's zfs-linux.)
+    enable/build/strip/package debug symbols on DKMS.
+    (this is sufficient with zfs-linux now in Eoan.)
+ 
  2) zfs-linux and spl-linux, for the stable releases,
-which need a few patches to enable debug symbols.
+    which need a few patches to enable debug symbols.
  
  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.
  
  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols.

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
Test Build 1) Old behavior

goal: show limitations/issues.

- original packaging
- zfs not built with debug symbols
- zfs modules not present in debug package
- extra modules lack .gnu_debuglink section

Original packaging:

There are no ZFS modules in the debug package:

$ dpkg-deb -x linux-image-unsigned-5.3.0-8-generic-
dbgsym_5.3.0-8.9_amd64.ddeb ddeb-orig

$ ls ddeb-orig/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs
...: No such file or directory

Accordingly, the ZFS modules are the only modules without
'.gnu_debuglink' section in the 'linux-modules' package:

$ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb
deb-modules

$ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j 
.gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; 
done
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko'

By the way, this is also the case for *all* modules in the 
'linux-modules-extra' package:
(only modules in the 'linux-modules' package have '.gnu_debuglink' sections).

$ dpkg-deb -x linux-modules-
extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules-extras

$ find deb-modules-extras/ -name '*.ko' | wc -l
4508

$ find deb-modules-extras/ -name '*.ko' | while read ko; do objdump -h 
-j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link 
'$ko'"; done | wc -l
4508

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
Test Build 4) All debug symbols disabled

goal: show no zfs debug symbol activity happens either (along w/ other
debug symbol stuff)

- test packaging
- nothing built with debug symbols
- no debug package present
- no .gnu_debuglink section at all
- (no regressions)


Test packaging, debug symbols disabled at all (skipdbg=true).

The dkms-build script doesn't do any debug symbol work at all.

II: dkms-build installing zfs into 
/home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs
signing zavl.ko
signing znvpair.ko
signing zunicode.ko
signing zcommon.ko
signing zfs.ko
signing icp.ko
signing zlua.ko
signing spl.ko
II: dkms-build build zfs complete

No debug sections are present in ZFS modules (as expected):

$ objdump -h deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko 
| grep debug
$

And the check for modules without debug symbols is not exercised (as
expected):

$ grep WARNING build.log
$

$ find deb-modules/ -name '*.ko' | wc -l
1000

$ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j 
.gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; 
done | wc -l
1000

$ find deb-modules-extra/ -name '*.ko' | wc -l
4508

$ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h -j 
.gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; 
done | wc -l
4508


** Description changed:

  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.
  
  Patches are required in:
  
  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)
  
  2) zfs-linux and spl-linux, for the stable releases,
-    which need a few patches to enable debug symbols.
+    which need a few patches to enable debug symbols
+(add option './configure --enable-debuginfo' and
+'(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)
  
  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.
  
  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
Test Build 2) New behavior if ZFS modules are *not* built with debug
symbols

goal: show failsafe/backwards compatible behavior if zfs-dksm doesn't 
support/build debug symbols
  and kernel build log reports missing debug symbols, and extra modules 
have .gnu_debuglink.

- test packaging
- zfs not built with debug symbols (disabled manually in dkms-build if-check)
- zfs modules not present in debug package
- extra modules have .gnu_debuglink section
- (no regressions)

Test packaging, with debug symbols *not enabled* in zfs-dkms:

The debug symbols are not found (as expected), 
and this case is handled without problems:

II: dkms-build installing zfs into 
/home/ubuntu/dbgsym/unstable/debian/linux-image-unsigned-5.3.0-8-generic-dbgsym/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs
 (debug symbols)
ignoring zavl.ko (missing debug symbols)
stripping zavl.ko
ignoring znvpair.ko (missing debug symbols)
stripping znvpair.ko
ignoring zunicode.ko (missing debug symbols)
stripping zunicode.ko
ignoring zcommon.ko (missing debug symbols)
stripping zcommon.ko
ignoring zfs.ko (missing debug symbols)
stripping zfs.ko
ignoring icp.ko (missing debug symbols)
stripping icp.ko
ignoring zlua.ko (missing debug symbols)
stripping zlua.ko
ignoring spl.ko (missing debug symbols)
stripping spl.ko
II: dkms-build installing zfs into 
/home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs
signing zavl.ko
signing znvpair.ko
signing zunicode.ko
signing zcommon.ko
signing zfs.ko
signing icp.ko
signing zlua.ko
signing spl.ko
II: dkms-build build zfs complete

The debug package contains the ZFS directory, but it's empty:

$ dpkg-deb -x 
linux-image-unsigned-5.3.0-8-generic-dbgsym_5.3.0-8.9_amd64.ddeb 
ddeb-test-disabled
$ ls 
ddeb-test-disabled/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs/
$

The kernel build log documents which modules do not have debug symbols,
now covering modules built with DKMS (zfs and vbox):

$ grep WARNING build.log 
echo "WARNING: Missing debug symbols for module 
'$module'."; \
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'.

The ZFS modules have no '.gnu_debuglink' section or any other debug
section (as expected):

$ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb
deb-modules

$ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j 
.gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; 
done
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/icp.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/spl.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zavl.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zcommon.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zfs.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zlua.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/znvpair.ko'
Module without debug link 
'deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/zunicode.ko'

$ for ko in deb-modules/lib/modules/5.3.0-8-generic/kernel/zfs/*.ko; do 
objdump -h $ko | grep debug; done
$

But all modules in 'linux-modules-extra' now have '.gnu_debuglink' sections
(except virtualbox modules which are DKMS-built without debug symbols too.)

$ dpkg-deb -x linux-modules-
extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb deb-modules-extra

$ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h 
-j .

[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
Test Build 3) New behavior if ZFS modules are built with debug symbols

goal: show zfs debug symbols are correctly built and packaged into non-
debug & debug packages.

- test packaging
- zfs built with debug symbols
- zfs modules present in debug package
- extra modules *have* .gnu_debuglink section

Test packaging, debug symbols *enabled* in zfs-dkms:

Modules are built with debug symbols, copied to debug package directory, 
and stripped before being copied into strip/non-debug package directory.

II: dkms-build installing zfs into 
/home/ubuntu/dbgsym/unstable/debian/linux-image-unsigned-5.3.0-8-generic-dbgsym/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs
 (debug symbols)
copying zavl.ko
stripping zavl.ko
copying znvpair.ko
stripping znvpair.ko
copying zunicode.ko
stripping zunicode.ko
copying zcommon.ko
stripping zcommon.ko
copying zfs.ko
stripping zfs.ko
copying icp.ko
stripping icp.ko
copying zlua.ko
stripping zlua.ko
copying spl.ko
stripping spl.ko
II: dkms-build installing zfs into 
/home/ubuntu/dbgsym/unstable/debian/linux-modules-5.3.0-8-generic/lib/modules/5.3.0-8-generic/kernel/zfs
signing zavl.ko
signing znvpair.ko
signing zunicode.ko
signing zcommon.ko
signing zfs.ko
signing icp.ko
signing zlua.ko
signing spl.ko
II: dkms-build build zfs complete

The ZFS modules are now present in the debug package:

$ dpkg-deb -x linux-image-unsigned-5.3.0-8-generic-
dbgsym_5.3.0-8.9_amd64.ddeb ddeb-test-enabled

$ ls -1 
ddeb-test-enabled/usr/lib/debug/lib/modules/5.3.0-8-generic/kernel/zfs/
icp.ko
spl.ko
zavl.ko
zcommon.ko
zfs.ko
zlua.ko
znvpair.ko
zunicode.ko

And now all modules in 'linux-modules' have the '.gnu_debuglink'
section:

$ dpkg-deb -x linux-modules-5.3.0-8-generic_5.3.0-8.9_amd64.deb 
deb-modules
$ find deb-modules/ -name '*.ko' | while read ko; do objdump -h -j 
.gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link '$ko'"; 
done
$


The build log no longer shows ZFS modules as missing debug symbols:

$ grep WARNING build.log 
echo "WARNING: Missing debug symbols for module 
'$module'."; \
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'.
$ 

The only modules in 'linux-modules-extra' without that continue to be
virtualbox modules:

$ dpkg-deb -x linux-modules-extra-5.3.0-8-generic_5.3.0-8.9_amd64.deb 
deb-modules-extra
$ find deb-modules-extra/ -name '*.ko' | while read ko; do objdump -h 
-j .gnu_debuglink $ko >/dev/null 2>&1 || echo "Module without debug link 
'$ko'"; done
Module without debug link 
'deb-modules-extra/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'
Module without debug link 
'deb-modules-extra/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'
$

As reflected in the kernel build log.

$ grep WARNING build.log 
echo "WARNING: Missing debug symbols for module 
'$module'."; \
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxguest.ko'.
WARNING: Missing debug symbols for module 
'/lib/modules/5.3.0-8-generic/kernel/virtualbox-guest/vboxsf.ko'.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to  

[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-28 Thread Mauricio Faria de Oliveira
[Unstable][PATCH 0/6] Add support for ZFS debug symbols
https://lists.ubuntu.com/archives/kernel-team/2019-August/103425.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1841148] Re: Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and 82579V using e1000e driver

2019-08-28 Thread Mauricio Faria de Oliveira
Hi Martijn,

Thanks for testing bionic-proposed!
So it will be resolved for bionic kernels shortly, when it hit bionic-updates.

Disco/19.04 will get this patch via stable updates in the near future
[1].

Eoan has it applied (LP: #1837725).

So this is all good.

Thanks again,
Mauricio

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: Confirmed

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Eoan)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu Disco)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1841148

Title:
  Kernel 4.15.0-58 breaks Intel Ethernet Connection for I219-V and
  82579V using e1000e driver

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  Since linux-image-4.15.0-58-generic my ethernet connection fails to
  get a connection.

  The network connection constantly goes up and down. The issue has been
  reported by another user:

  https://bugzilla.kernel.org/show_bug.cgi?id=204591

  Snippet from kern.log showing that the connection constantly goes up
  and down:

  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134651] e1000e: enp0s31f6 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134830] /dev/vmnet: open 
called by PID 5847 (vmnet-bridge)
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134836] /dev/vmnet: hub 0 
does not exist, allocating memory.
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134858] /dev/vmnet: port 
on hub 0 successfully opened
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134868] bridge-enp0s31f6: 
up
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.134872] bridge-enp0s31f6: 
attached
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334794] userif-2: sent 
link down event.
  Aug 20 10:06:00 martijn-ThinkPad-P50 kernel: [ 2427.334801] userif-2: sent 
link up event.
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.156471] bridge-enp0s31f6: 
disabling the bridge on dev down
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158580] bridge-enp0s31f6: 
down
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.158599] bridge-enp0s31f6: 
detached
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356790] userif-2: sent 
link down event.
  Aug 20 10:06:01 martijn-ThinkPad-P50 kernel: [ 2428.356795] userif-2: sent 
link up event.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295365] e1000e: enp0s31f6 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295729] /dev/vmnet: open 
called by PID 5847 (vmnet-bridge)
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295741] /dev/vmnet: hub 0 
does not exist, allocating memory.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295785] /dev/vmnet: port 
on hub 0 successfully opened
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295804] bridge-enp0s31f6: 
up
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.295810] bridge-enp0s31f6: 
attached
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495615] userif-2: sent 
link down event.
  Aug 20 10:06:08 martijn-ThinkPad-P50 kernel: [ 2435.495620] userif-2: sent 
link up event.
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316505] bridge-enp0s31f6: 
disabling the bridge on dev down
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316593] bridge-enp0s31f6: 
down
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.316607] bridge-enp0s31f6: 
detached
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516761] userif-2: sent 
link down event.
  Aug 20 10:06:09 martijn-ThinkPad-P50 kernel: [ 2436.516767] userif-2: sent 
link up event.
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.438729] e1000e: enp0s31f6 
NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440433] /dev/vmnet: open 
called by PID 5847 (vmnet-bridge)
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440439] /dev/vmnet: hub 0 
does not exist, allocating memory.
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440466] /dev/vmnet: port 
on hub 0 successfully opened
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440475] bridge-enp0s31f6: 
up
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.440479] bridge-enp0s31f6: 
attached
  Aug 20 10:06:14 martijn-ThinkPad-P50 kernel: [ 2441.638884] userif-2: sent 
link down event.
  Aug 20 10:06:

[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-29 Thread Mauricio Faria de Oliveira
Attaching the debdiffs for zfs-linux/spl-linux on X/B/D/E,
for documentation purposes; will send testing/notes later.

Independently of the kernel packaging approach determined
to enable debug symbols on ZFS/SPL modules, these kind of
patches for the userspace packages are be required anyway,
and correctly performed that task when building with DKMS.

So I'll probably move forward with their SRU request soon,
in the benefit of having this available sooner if required
(i.e. so users/engineers in need of debug symbols may just
rebuild with DKMS using this, and be able to investigate.)

** Attachment added: "lp1840704_debdiffs.tar"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+attachment/5285635/+files/lp1840704_debdiffs.tar

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-09-02 Thread Mauricio Faria de Oliveira
Marking status on B/X/D as Incomplete.

(email below sent to kernel-team mailing list
as replies to both patch series above).

Please hold / don't apply this patch for now.

The reporter hit an apparently unrelated Oops in 3 of 40 nodes,
and it hasn't been possible yet to determine whether this patch
is at all related or at fault, due to timing/deployment matters
preventing a methodical approach to revert to a original kernel.

Since the patch is recent even in the mainline kernel, holding
it up for a bit seemed to be the most prudent action for LTSes
and thus drop the patch which would be required on Disco too.

We'll be following up on this as possible on the reporter's end.

** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Incomplete

** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Incomplete

** Changed in: linux (Ubuntu Disco)
   Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Incomplete
Status in linux source package in Bionic:
  Incomplete
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * The bnx2x driver may cause hardware faults (leading to
     panic/reboot) and other behaviors as transmit timeouts,
     after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
     introduced.

   * This issue has been observed by an user shortly
     after starting docker & kubelet, with adapters:
     - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
     - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]

   * If options to ignore hardware faults are used
     (erst_disable=1 hest_disable=1 ghes.disable=1)
     the system doesn't panic/reboot and continues
     on to timeout on adapter stats, then transmit
     timeouts, spewing some adapter firmware dumps,
     but the network interface is non-functional.

   * The issue only happened when LLDP is enabled
     on the network switches, and crashdump shows
     the bnx2x driver is stuck/waits for firmware
     to complete the stop traffic command in LLDP
     handling. Workaround used is to disable LLDP
     in the network switches/ports.

   * Analysis of the driver and firmware dumps
     didn't help significantly towards finding
     the root cause.

   * Upstream/mainline recently just reverted the
     patch, due to similar problem reports, while
     looking for the root cause/proper fix.

  [Test Case]

   * No reproducible test case found outside
     the user's systems/cluster, where it is
     enough to start docker & kubelet & wait.

   * The user verified test kernels for Xenial
     and Bionic - the problem does not happen;
 build-tested on Disco.

  [Regression Potential]

   * Users who significantly use/apply the non-default
     traffic class (tc) / class of service (cos) might
     possibly see performance changes (if any at all)
     in such applications, however that's unclear now.

   * This is a recent revert upstream (v5.3-rc'ish),
     so there's chance things might change in this area.

   * Nonetheless, the patch is authored by the driver
     vendor, and made its way into stable kernels
     (e.g., v5.2.8 which made Eoan/19.10 recently).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-14 Thread Mauricio Faria de Oliveira
** Description changed:

  [Impact]
  
-  * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
-(settings for network interface, initiator, and target) in the installer
-because the 'iscsi_ibft' module is not present in udeb packages.
+  * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
+    (settings for network interface, initiator, and target) in the installer
+    because the 'iscsi_ibft' module is not present in udeb packages.
  
-  * Even if it was, the installer does not handle iBFT information at all,
-thus any settings are ignored, and iSCSI-related configuration has to
-be done manually or with workarounds.
+  * Even if it was, the installer does not handle iBFT information at all,
+    thus any settings are ignored, and iSCSI-related configuration has to
+    be done manually or with workarounds.
  
-  * This impacts user-experience and automatic installation on systems and
-deployments which actually do provide the iBFT feature and information,
-but cannot use it practically.
+  * This impacts user-experience and automatic installation on systems and
+    deployments which actually do provide the iBFT feature and information,
+    but cannot use it practically.
  
-  * With proper iBFT support in the installer (kernel module in udeb package
-and automatic iSCSI-related configuration) users will be able to rely on
-iBFT to install/deploy Ubuntu on their servers and datacenters.
+  * With proper iBFT support in the installer (kernel module in udeb package
+    and automatic iSCSI-related configuration) users will be able to rely on
+    iBFT to install/deploy Ubuntu on their servers and datacenters.
  
-  * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
-and configure network/iSCSI according to iBFT information in disk-detect.
+  * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
+    and configure network/iSCSI according to iBFT information in disk-detect.
  
-This is done in disk-detect so that the iSCSI LUNs are detected as disks
-(useful in case of no other disks in the system so the installer doesn't
-complain nor wait too long) and that any partman-related preseed options
-are not required and may be still available for the user.
+    This is done in disk-detect so that the iSCSI LUNs are detected as disks
+    (useful in case of no other disks in the system so the installer doesn't
+    complain nor wait too long) and that any partman-related preseed options
+    are not required and may be still available for the user.
  
  [Test Case]
  
-  * linux package / kernel module in udeb:
+  * linux package / kernel module in udeb:
  
-$ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko
+    $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko
  
-Check the module loads in the installer environment.
-See comment with example for disco.
+    Check the module loads in the installer environment.
+    See comment with example for disco.
  
-  * d-i/hw-detect package:
-(to be done)
+  * d-i/hw-detect/partman-iscsi package:
+    (to be done)
  
  [Regression Potential]
  
-  * linux package: low, the kernel module is not loaded by default,
-and only checks whether iBFT information is present in firmware,
-then exposes that in sysfs in read-only mode.
+  * linux package: low, the kernel module is not loaded by default,
+    and only checks whether iBFT information is present in firmware,
+    then exposes that in sysfs in read-only mode.
  
-  * d-i/hw-detect:
-(to be done)
+  * d-i/hw-detect/partman-iscsi:
+    (to be done)
  
  [Other Info]
-  
-  * This has been verified both by the developer with a simple iSCSI
-iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
-and by an user with system/firmware that supports iBFT for iSCSI.
+ 
+  * This has been verified both by the developer with a simple iSCSI
+    iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
+    and by an user with system/firmware that supports iBFT for iSCSI.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with wor

[Kernel-packages] [Bug 1829563] Re: [4.15] bcache device is accessible even if a backing device is not (writeback mode)

2019-05-17 Thread Mauricio Faria de Oliveira
** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  [4.15] bcache device is accessible even if a backing device is not
  (writeback mode)

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "bionic_d-i.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265731/+files/bionic_d-i.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
Test Procedure with KVM guests + iPXE
=

- 2 guests: iSCSI target/server and iSCSI initiator/client.
- 1 bridge for iSCSI traffic (virbr-iscsi, new), static ip.
- 1 bridge for internet access (virbr0, exists), dhcp ip. 


Host:


Configure the iSCSI bridge and QEMU access in the host:

$ sudo ip link add dev virbr-iscsi type bridge
$ sudo ip link set dev virbr-iscsi up
$ echo 'allow virbr-iscsi' | sudo tee -a /etc/qemu/bridge.conf
$ sudo chmod +s /usr/lib/qemu/qemu-bridge-helper

iSCSI target:


This guest serves an iSCSI target with one LUN
in iSCSI NIC with IP 10.0.0.1 for IP 10.0.0.2.

Install/boot this guest:

$ qemu-img create -f qcow2 guest-iscsi-target.qcow2 16g

$ qemu-system-x86_64 \
  -nodefaults \
  -enable-kvm \
  -smp 2 -m 4096 \
  -serial stdio \
  -vga virtio \
  -display vnc=0.0.0.0:2 \
  -netdev bridge,id=bridge-world,br=virbr0 \
  -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \
  -device 
virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:11 \
  -device 
virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:22 \
  -drive file=guest-iscsi-target.qcow2,if=virtio \
  -drive file=$RELEASE-server-amd64.iso,media=cdrom,read-only,if=scsi \
  -boot once=d

Configure iSCSI NIC:

$ cat <
link/ether 52:54:00:00:00:22 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/24 brd 10.0.0.255 scope global ens4
...

Configure iSCSI target/lun:

# apt-get install -y tgt

# mkdir /var/lib/iscsi
# dd if=/dev/zero of=/var/lib/iscsi/disk bs=1 count=0 seek=8G

# tgtadm --lld iscsi --op new --mode target --tid 1 -T 
iqn.2019-03.com.example:target1
# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b 
/var/lib/iscsi/disk

# tgtadm --lld iscsi --op bind --mode target --tid 1 -I 10.0.0.2
# tgt-admin --dump >/etc/tgt/conf.d/target1.conf


iSCSI initiator:
---

This guest first boots iPXE to configure iBFT,
and then boots/chainloads to debian-installer.

The netboot initrd does not contain all patched udebs,
so download and install disk-detect and partman-iscsi
from the PPA during the install.

$ wget http://boot.ipxe.org/ipxe.lkrn
$ wget 
http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/dists/$RELEASE/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz}

$ python3 -m http.server &
Serving HTTP on 0.0.0.0 port 8000 ...

$ qemu-system-x86_64 \
  -nodefaults \
  -enable-kvm \
  -smp 2 -m 4096 \
  -serial stdio \
  -vga virtio \
  -display vnc=0.0.0.0:1 \
  -netdev bridge,id=bridge-world,br=virbr0 \
  -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \
  -device 
virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \
  -device 
virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \
  -kernel ipxe.lkrn

Connect to VNC for iPXE shell:

$ vncviewer :1
iPXE <...>


Press Ctrl-B for iPXE command line.
^B

iPXE>

Configure iSCSI NIC:

iPXE> ifopen net1
iPXE> set net1/ip 10.0.0.2
iPXE> set net1/netmask 255.255.255.0

Configure iBFT: (iSCSI portal 10.0.0.1, LUN 1 on target
iqn.<...>:target1)

iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-03.com.example:target1
Registered SAN device 0x80

Boot the installer 
(add option 'disk-detect/ibft/enable=true' for installer to detect iBFT iSCSI 
disks
 and option 'partman-iscsi/iscsi_auto=true' to set the system to boot from 
iBFT):

iPXE> ifopen net0
iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz 
disk-detect/ibft/enable=true partman-iscsi/iscsi_auto=true --- console=ttyS0
iPXE> initrd http://192.168.122.1:8000/initrd.gz
iPXE> boot 

Back to serial console.
Proceed with the installer.

In 'Users and passwords' dialog, select 'Go back', and 'Execute a
shell', and 'Continue'.

Check kernel version and iscsi_ibft.ko module is present.

~ # uname -rv
5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:58:11 UTC 2019

~ # depmod -a
~ # modinfo --filename iscsi_ibft
/lib/modules/5.0.0-8-generic/kernel/drivers/firmware/iscsi_ibft.ko

~ # wget 
http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/pool/main/h/hw-detect/disk-detect_1.117ubuntu7.$VERSION_amd64.udeb
~ # wget 
http://ppa.launchpad.net/mfo/lp1817321v3/ubuntu/pool/main/p/partman-iscsi/partman-iscsi_40ubuntu4.$VERSION_all.udeb

~ # udpkg --unpack *.udeb

~ # debconf-get disk-detect/ibft/enable 
true
~ # debconf-get partman-iscsi/iscsi_auto
true

(Use this if you need it; e.g., forgot kernel cmdline options)
~ # debconf-set disk-detect/ibft/enable true


Start another installer menu with the new debconf templates/question:

~ # debconf -o d-i /usr/bin/main-menu

Proceed with the installer.

In the 'Partition disks' dialog, the iSCSI LUN should be present:

S

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
Test procedure for the partman-iscsi changes


Based on the previous comment setup.


iSCSI initiator:
---

...

Note there's no 'iscsi_auto' in the kernel cmdline:


iPXE> ifopen net0
iPXE> kernel http://192.168.122.1:8000/vmlinuz initrd=initrd.gz --- 
console=ttyS0
iPXE> initrd http://192.168.122.1:8000/initrd.gz
iPXE> boot 

Back to serial console.
Proceed with the installer.

In 'Users and passwords' dialogs, select 'Go back', and 'Execute a
shell', and 'Continue'.



Bring up the iSCSI devices with iBFT 
(manually or with patched disk-detect udeb)

~ # modprobe iscsi_ibft

~ # iscsistart -N
Setting up software interface ens4

~ # iscsistart -b
iscsistart: Logging into iqn.2019-03.com.example:target1 10.0.0.1:3260,1
iscsistart: version 2.0-874
iscsistart: Connection1:0 to [target: iqn.2019-03.com.example:target1, portal: 
10.0.0.1,3260] through [iface: default] is operational now


~ # dmesg | grep -e iBFT -e sd
[0.007308] iBFT found at 0x9e520.
[   94.949058] iBFT detected.
[  105.158333] sd 2:0:0:1: Attached scsi generic sg2 type 0
[  105.158800] sd 2:0:0:1: Power-on or device reset occurred
[  105.161642] sd 2:0:0:1: [sda] 16777216 512-byte logical blocks: (8.59 
GB/8.00 GiB)
[  105.161646] sd 2:0:0:1: [sda] 4096-byte physical blocks
[  105.161970] sd 2:0:0:1: [sda] Write Protect is off
[  105.161974] sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08
[  105.162645] sd 2:0:0:1: [sda] Write cache: enabled, read cache: enabled, 
supports DPO and FUA
[  105.174899] sd 2:0:0:1: [sda] Attached SCSI disk


See interface 1 (ens3) is default interface in 192.168.122.0/24 range,
and interface 2 (ens4) is a iSCSI interface in 10.0.0.0/24 range.

~ # ip addr list
...
2: ens3: ...
link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.162/24 brd 192.168.122.255 scope global ens3
...
3: ens4: ...
link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4
...


Return with 'exit' and proceed with the installer until the 'Partition disks' 
dialog.


Test #0) Original partman-iscsi


In 'Partition disks' dialogs, select 'Go back', and 'Change debconf
priority', and 'Continue', and 'low'.

Return to the 'Partition disks' dialog.
Select 'Guided partitioning', 
and 'Guided - use entire disk',
and 'SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK,
and 'All files in one partition (recommended for new users)',
and 'Finish partitioning and write changes to disk',
and 'No' in 'No partitions for use as swap space',
and 'Yes' in 'Write the changes to disks?',
and 'Continue' and 'Continue' for swap file questions.

This should partition and format the disk, and return to the menu, as
the debconf priority is 'low'.

Select 'Execute a shell', and 'Continue'.

See that the wrong network interface is used for the HWADDR field in 
iscsi.initramfs
(trailing :01 instead of :02).

~ # cat /target/etc/iscsi/iscsi.initramfs 
HWADDR="52:54:00:00:00:01"
ISCSI_TARGET_NAME="iqn.2019-03.com.example:target1"
ISCSI_TARGET_IP="10.0.0.1"
ISCSI_TARGET_PORT="3260"
ISCSI_TARGET_GROUP="1"


Test #1) Patched partman-iscsi, changes for patch 1/2  

(use the iSCSI interface for HWADDR and /etc/network/interfaces)

Install the patched udeb:

~ # wget 
http://ppa.launchpad.net/mfo/sf211547v2/ubuntu/pool/main/p/partman-iscsi/partman-iscsi_40ubuntu4.18.04.1_all.udeb
~ # udpkg --unpack partman-iscsi_40ubuntu4.18.04.1_all.udeb


Verify the new option is not yet enabled.

~ # debconf-get partman-iscsi/iscsi_auto
false

Unmount the swap so 'Partition disks' can work again.

~ # swapoff /target/swapfile

Return with 'exit' and repeat the 'Partition disks'/'Execute a shell'
procedure from Test #0.

See that the iSCSI network interface is now used in HWADDR:

~ # cat /target/etc/iscsi/iscsi.initramfs 
HWADDR="52:54:00:00:00:02"
ISCSI_TARGET_NAME="iqn.2019-03.com.example:target1"
ISCSI_TARGET_IP="10.0.0.1"
ISCSI_TARGET_PORT="3260"
ISCSI_TARGET_GROUP="1"


Test #2) Patched partman-iscsi, changes for patch 2/2  

(use ISCSI_AUTO=true in /etc/iscsi/iscsi.initramfs)

Now enable the 'partman-iscsi/iscsi_auto' option,
and start a new debconf/menu to detect its value:

Install the patched udeb again, so the option re-appears:

~ # debconf-get partman-iscsi/iscsi_auto
~ #

~ # udpkg --unpack partman-iscsi_40ubuntu4.19.04.1_all.udeb

~ # debconf-get partman-iscsi/iscsi_auto
false
~ # debconf-set partman-iscsi/iscsi_auto true
~ # debconf-get partman-iscsi/iscsi_auto
true


~ # swapoff /target/swapfile

~ # debconf -o d-i /usr/bin/main-menu

Repeat the 'Partition disks'/'Execute a shell' procedure from Test #0.

See that 'ISCSI_AUTO=true' is not configured in 'iscsi.initramfs'.

~ # cat /target/etc/iscsi/iscsi.initramfs
ISCSI_AUTO=true

Return with 'exit', then 'Change debconf priority' to 'high' again, and
proceed/finish the installation.

System reboots.

B

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "cosmic_d-i.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265734/+files/cosmic_d-i.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "bionic_hw-detect.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265732/+files/bionic_hw-detect.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
Adding patches and test procedure for the debian-installer userspace part.
(disk-detect probes iSCSI iBFT disks; partman-iscsi sets ISCSI_AUTO=true for 
booting.)

The d-i patches should be uploaded/built _after_ hw-detect and partman-iscsi
are successfully built and published, so that its new versions are picked up.

The d-i kernel version change has been tested for the architectures
amd64, i386, arm64, and ppc64el, on both regular/lvm partitioning,
using VMs/QEMU for all archs, plus baremetal for amd64.

The hw-detect/partman-iscsi has been tested with a simple, virtual
iSCSI/iBFT setup using iPXE, which will be described shortly.

cheers,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Un

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "cosmic_hw-detect.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265735/+files/cosmic_hw-detect.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "disco_d-i.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265737/+files/disco_d-i.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "cosmic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265736/+files/cosmic_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "disco_hw-detect.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265738/+files/disco_hw-detect.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "eoan_d-i.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265740/+files/eoan_d-i.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "eoan_hw-detect.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265741/+files/eoan_hw-detect.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "disco_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265739/+files/disco_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "eoan_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265742/+files/eoan_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
** Patch added: "bionic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265733/+files/bionic_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Confirmed
Status in hw-detect source package in Eoan:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Confirmed

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-22 Thread Mauricio Faria de Oliveira
/i386/arm64/ppc64el on QEMU, plus amd64
-   on baremetal -- see comment 11.
-- hw-detect: low, the changes are enabled by a preseed option.
- see comment 12.
-- partman-iscsi: low, simple changes, plus one fix that has
- been tested in detail, and falls back to
- previous behavior if it fails.
- see comment 13.
+   based on kernel released to -updates plus one week
+   monitoring bug reports -- it should be OK.
+   Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
+   on baremetal -- see comment 11.
+    - hw-detect: low, the changes are enabled by a preseed option.
+ see comment 12.
+    - partman-iscsi: low, simple changes, plus one fix that has
+ been tested in detail, and falls back to
+ previous behavior if it fails.
+ see comment 13.
  
  [Other Info]
  
   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

** Also affects: debian-installer (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: hw-detect (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: partman-iscsi (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: debian-installer (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Also affects: hw-detect (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
 Assignee: Mauricio Faria de Oliveira (mfo)
   Status: Fix Released

** Also affects: partman-iscsi (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Changed in: debian-installer (Ubuntu Bionic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: debian-installer (Ubuntu Cosmic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: debian-installer (Ubuntu Disco)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: debian-installer (Ubuntu Eoan)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: partman-iscsi (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: partman-iscsi (Ubuntu Bionic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: partman-iscsi (Ubuntu Cosmic)
   Status: New => Confirmed

** Changed in: partman-iscsi (Ubuntu Cosmic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: partman-iscsi (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: partman-iscsi (Ubuntu Disco)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: partman-iscsi (Ubuntu Eoan)
   Status: New => Confirmed

** Changed in: partman-iscsi (Ubuntu Eoan)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: hw-detect (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: hw-detect (Ubuntu Bionic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: hw-detect (Ubuntu Cosmic)
       Status: New => Confirmed

** Changed in: hw-detect (Ubuntu Cosmic)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: hw-detect (Ubuntu Disco)
       Status: New => Confirmed

** Changed in: hw-detect (Ubuntu Disco)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: hw-detect (Ubuntu Eoan)
   Status: New => Confirmed

** Changed in: hw-detect (Ubuntu Eoan)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Changed in: debian-installer (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: debian-installer (Ubuntu Cosmic)
   Status: New => Confirmed

** Changed in: debian-installer (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: debian-installer (Ubuntu Eoan)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Confirmed
Status in hw-detect package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Confirmed
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Release

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-03-15 Thread Mauricio Faria de Oliveira
Installer (non-kernel) patches submitted to Debian for feedback.
- disk-detect: https://bugs.debian.org/924675
- partman-iscsi: https://bugs.debian.org/924680

** Bug watch added: Debian Bug tracker #924675
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=924675

** Bug watch added: Debian Bug tracker #924680
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=924680

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
 (settings for network interface, initiator, and target) in the installer
 because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
 thus any settings are ignored, and iSCSI-related configuration has to
 be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
 deployments which actually do provide the iBFT feature and information,
 but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
 and automatic iSCSI-related configuration) users will be able to rely on
 iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
 and configure network/iSCSI according to iBFT information in disk-detect.

 This is done in disk-detect so that the iSCSI LUNs are detected as disks
 (useful in case of no other disks in the system so the installer doesn't
 complain nor wait too long) and that any partman-related preseed options
 are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

 $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

 Check the module loads in the installer environment.
 See comment with example for disco.

   * d-i/hw-detect package:
 (to be done)

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
 and only checks whether iBFT information is present in firmware,
 then exposes that in sysfs in read-only mode.

   * d-i/hw-detect:
 (to be done)

  [Other Info]
   
   * This has been verified both by the developer with a simple iSCSI
 iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
 and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-03-18 Thread Mauricio Faria de Oliveira
Hi Marius @lazamarius1,

Per the kernel.ubuntu.com schedule, the version for Bionic/linux ->
Xenial/linux-hwe should land soon.

You can verify the version/timestamps for each package/release at the bottom of 
these pages
(the linux-hwe version comes a bit after the corresponding linux version)

https://launchpad.net/ubuntu/+source/linux
https://launchpad.net/ubuntu/+source/linux-hwe

As far as testing, yes, this issue might take longer to reproduce, but initial 
testing from another user that happened in order to first submit the fix to 
Ubuntu showed good results, so it's previously good sign of it invidivudally.
The integration of it with other fixes, i.e., testing with it in -proposed, 
will be done by that other user as well, so collectively w/ your testing that 
might increase chances of the issue still happening or not.
There's also regression testing of the kernel builds, which can spot failures, 
so that collaborates too.

Hope this helps,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Confirmed
Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in linux-azure source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Committed
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
  rcu.git/commit/?h=dev&id=1a05c0cd2fee234a10362cc8f66057557cbb291f

  srcu: Lock srcu_data structure in srcu_gp_start()
  The srcu_gp_start() function is called with the srcu_struct structure's
  ->lock held, but not with the srcu_data structure's ->lock.  This is
  problematic because this function accesses and updates t

[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-03-18 Thread Mauricio Faria de Oliveira
@lazamarius1,

Actually linux-hwe for Bionic with this fix has just been uploaded.
See in https://launchpad.net/ubuntu/+source/linux-hwe


Changelog

linux-hwe (4.15.0-47.50~16.04.1) xenial; urgency=medium
...
  * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021)
- srcu: Prohibit call_srcu() use under raw spinlocks
- srcu: Lock srcu_data structure in srcu_gp_start()
...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Confirmed
Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in linux-azure source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Committed
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
  rcu.git/commit/?h=dev&id=1a05c0cd2fee234a10362cc8f66057557cbb291f

  srcu: Lock srcu_data structure in srcu_gp_start()
  The srcu_gp_start() function is called with the srcu_struct structure's
  ->lock held, but not with the srcu_data structure's ->lock.  This is
  problematic because this function accesses and updates the srcu_data
  structure's ->srcu_cblist, which is protected by that lock.  Failing to
  hold this lock can result in corruption of the SRCU callback lists,
  which in turn can result in arbitrarily bad results.

  This commit therefore makes srcu_gp_start() acquire the srcu_data
  structure's ->lock across the calls to rcu_segcblist_advance() and
  rcu_segcblist_accelerate(), thus preventing this corruption.

  Please investigate this issue and evaluate the proposed fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-03-18 Thread Mauricio Faria de Oliveira
cosmic-proposed verification done;
iscsi_ibft.ko is present in udeb and loads correctly.
---

$ uname -rv
4.18.0-17-generic #18-Ubuntu SMP Wed Mar 13 14:34:40 UTC 2019

$ apt-get download scsi-modules-4.18.0-17-generic-di

$ dpkg-deb -c scsi-modules-4.18.0-17-generic-di_4.18.0-17.18_amd64.udeb | grep 
ibft
-rw-r--r-- root/root 17257 2019-03-13 11:52 
./lib/modules/4.18.0-17-generic/kernel/drivers/firmware/iscsi_ibft.ko

$ dpkg-deb -x scsi-modules-4.18.0-17-generic-di_4.18.0-17.18_amd64.udeb
udeb

$ sudo insmod 
udeb/lib/modules/4.18.0-17-generic/kernel/drivers/scsi/iscsi_boot_sysfs.ko 
$ sudo insmod 
udeb/lib/modules/4.18.0-17-generic/kernel/drivers/firmware/iscsi_ibft.ko 

$ dmesg | grep -i ibft
[  117.143116] No iBFT detected.

** Tags removed: verification-needed-cosmic
** Tags added: verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
 (settings for network interface, initiator, and target) in the installer
 because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
 thus any settings are ignored, and iSCSI-related configuration has to
 be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
 deployments which actually do provide the iBFT feature and information,
 but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
 and automatic iSCSI-related configuration) users will be able to rely on
 iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
 and configure network/iSCSI according to iBFT information in disk-detect.

 This is done in disk-detect so that the iSCSI LUNs are detected as disks
 (useful in case of no other disks in the system so the installer doesn't
 complain nor wait too long) and that any partman-related preseed options
 are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

 $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

 Check the module loads in the installer environment.
 See comment with example for disco.

   * d-i/hw-detect package:
 (to be done)

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
 and only checks whether iBFT information is present in firmware,
 then exposes that in sysfs in read-only mode.

   * d-i/hw-detect:
 (to be done)

  [Other Info]
   
   * This has been verified both by the developer with a simple iSCSI
 iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
 and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-03-18 Thread Mauricio Faria de Oliveira
bionic-proposed verification done;
iscsi_ibft.ko is present in udeb and loads correctly.
---

$ uname -rv
4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019

$ apt-get download scsi-modules-4.15.0-47-generic-di

$ dpkg-deb -c scsi-modules-4.15.0-47-generic-di_4.15.0-47.50_amd64.udeb | grep 
ibft
-rw-r--r-- root/root 17086 2019-03-13 04:37 
./lib/modules/4.15.0-47-generic/kernel/drivers/firmware/iscsi_ibft.ko

$ dpkg-deb -x scsi-modules-4.15.0-47-generic-di_4.15.0-47.50_amd64.udeb
udeb

$ sudo insmod 
udeb/lib/modules/4.15.0-47-generic/kernel/drivers/scsi/iscsi_boot_sysfs.ko 
$ sudo insmod 
udeb/lib/modules/4.15.0-47-generic/kernel/drivers/firmware/iscsi_ibft.ko 

$ dmesg | grep -i ibft
[  297.999505] No iBFT detected.


** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
 (settings for network interface, initiator, and target) in the installer
 because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
 thus any settings are ignored, and iSCSI-related configuration has to
 be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
 deployments which actually do provide the iBFT feature and information,
 but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
 and automatic iSCSI-related configuration) users will be able to rely on
 iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
 and configure network/iSCSI according to iBFT information in disk-detect.

 This is done in disk-detect so that the iSCSI LUNs are detected as disks
 (useful in case of no other disks in the system so the installer doesn't
 complain nor wait too long) and that any partman-related preseed options
 are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

 $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

 Check the module loads in the installer environment.
 See comment with example for disco.

   * d-i/hw-detect package:
 (to be done)

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
 and only checks whether iBFT information is present in firmware,
 then exposes that in sysfs in read-only mode.

   * d-i/hw-detect:
 (to be done)

  [Other Info]
   
   * This has been verified both by the developer with a simple iSCSI
 iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
 and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821259] [NEW] Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Public bug reported:

[Impact]

 * This problem hard locks up 2 CPUs in a deadlock, and this
   soft locks up other CPUs as an effect; the system becomes
   unusable.

 * This is relatively rare / difficult to hit because it's a
   corner case in scheduling/load balancing that needs timing
   with CPU stopper code. And it needs SMP plus _NUMA_ system.
   (but it can be hit with synthetic test case attached in LP.)

 * Since SMP plus NUMA usually equals _servers_ it looks like
   a good idea to prevent this bug / hard lockups / rebooting.

 * The fix resolves the potential deadlock by removing one of
   the calls required to deadlock from under the locked code.

[Test Case]

 * There's a synthetic test case to reproduce this problem
   (although without the stack traces - just a system hang)
   attached to this LP bug.

 * It uses kprobes/mdelay/cpu stopper calls to force the code
   to execute and force the timing/locking condition to occur.

 * $ sudo insmod kmod-stopper.ko

   Some dmesg logging occurs, and systems either hangs or not.
   See examples in comments.
   
[Regression Potential] 

 * These are patches to the cpu stop_machine.c code, and they
   change a bit how it works;  however, there are no upstream
   fixes for these patches anymore and they are still the top
   of the 'git log --oneline -- kernel/stop_machine.c' output.

 * These patches have been verified with the synthetic test case
   and 'stress-ng --class scheduler --sequential 0' (no regressions)
   on guest with 2 CPUs and one physical system with 24 CPUs.

[Other Info]
 
 * The patches are required on Xenial and later.
 * There are 4 patches for Xenial, and 2 patches pending for Bionic.
 * All patches are applied from Cosmic onwards.

[Original Description]

These 2 hard lockups happened all of a sudden in the logs, and many soft
lockups occur after them as a fallout.

Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: Watchdog 
detected hard LOCKUP on cpu 10
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 Comm: 
migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 883ff2a76200 
ti: 883ff211 task.ti: 883ff211
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 0101 
RBX: 0086 RCX: 0001
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 0101 
RSI: 0001 RDI: 881fff991ba8
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 883ff2113c58 
R08: 0101 R09: 883ff082e200
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 2e04 
R11: 2e04 R12: 881fff997c60
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 881fff991ba8 
R14:  R15: 881fff997300
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  ES: 
 CR0: 80050033
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 7f7caaa23020 
CR3: 001f4674 CR4: 00160670
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094]  881fff991ba0 
883ff2113cb0 8111c600 881fff997300
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096]  881fff997c90 
881ff03dd400  883ff2113cc0
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace:
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105]  [] 
queued_spin_lock_slowpath+0xb/0xf
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109]  [] 
_raw_spin_lock_irqsave+0x37/0x40
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113]  [] 
cpu_stop_queue_work+0x30/0x80
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484116]  [] 
stop_one_cpu_nowait+0x30/0x40
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484119]  [] 
load_balance+0x71b/0x940
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484122]  [] 
pick_next_task_fair+0x275/0x4b0
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484126]  [] 
__schedule+0x6c6/0x7f0
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484132]  [] 
? sort_range+0x30/0x30
Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484134]  [] 
sched

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Analysis


The 1st hard lockup is harder to get the interesting data out of, as apparently 
the registers with variables
related to the cpu number have been clobbered by more recent calls in the 
spinlock path.

Looking at the 2nd hard lockup:

addr2line + code shows us that try_to_wake_up() in line 1997 is indeed
looping with IRQs disabled in line 1939 (thus a hard lockup):

$ addr2line -pifae 
ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic 0x810aacb6
 
0x810aacb6: try_to_wake_up at 
/build/linux-lts-xenial-ozsla7/linux-lts-xenial-4.4.0/kernel/sched/core.c:1997

1926 static int
1927 try_to_wake_up(struct task_struct *p, unsigned int state, int 
wake_flags)
1928 {
...
1939 raw_spin_lock_irqsave(&p->pi_lock, flags);
...
1993 /*
1994  * If the owning (remote) cpu is still in the middle of 
schedule() with
1995  * this task as prev, wait until its done referencing the task.
1996  */
1997 while (p->on_cpu)
1998 cpu_relax();
...
2027 raw_spin_unlock_irqrestore(&p->pi_lock, flags);
2028 
2029 return success;
2030 }

The objdump disassembly of try_to_wake_up() in vmlinux for the RIP instruction 
address (810aacb6),
shows a while loop that just checks for non-zero 'p->on_cpu' and calls 
cpu_relax() (which translates to the 'pause' instruction):

810aacb1:   f3 90   pause
810aacb3:   8b 43 28mov0x28(%rbx),%eax
810aacb6:   85 c0   test   %eax,%eax
810aacb8:   75 f7   jne810aacb1 


So, it checks for the value in pointer in RBX + offset 0x28, which
according to the 'pahole' tool, is indeed the 'on_cpu' field:

$ pahole --hex -C task_struct 
ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic | grep on_cpu 
 
inton_cpu;   /*  0x28   0x4 */

So, the task_struct pointer is in RBX, which is:

RBX: 883ff2a76200

And that matches the other hard locked up task on CPU 10 (see its
'task:' field).

Per the stack trace in CPU 10, and the identical timestamp of the two hard 
lockup messages, and the fact both stack traces are cpu_stopper related,
it does look like CPU 10 is waiting on the spinlock of one of the 2 cpu 
stoppers held by CPU 6, which is exactly the scenario in the suggested patch.

The problem/fix has been verified with a synthetic test-case (attached).


commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f
Author: Peter Zijlstra 
Date:   Fri Apr 20 11:50:05 2018 +0200

stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock

Matt reported the following deadlock:

CPU0CPU1

schedule(.prev=migrate/0)   
  pick_next_task()...
idle_balance()  migrate_swap()
  active_balance()stop_two_cpus()
spin_lock(stopper0->lock)
spin_lock(stopper1->lock)
ttwu(migrate/0)
  smp_cond_load_acquire() 
-- waits for schedule()
stop_one_cpu(1)
  spin_lock(stopper1->lock) -- waits for stopper lock

Fix this deadlock by taking the wakeups out from under stopper->lock.
This allows the active_balance() to queue the stop work and finish the
context switch, which in turn allows the wakeup from migrate_swap() to
observe the context and complete the wakeup.
<...>


The stop_two_cpus() call can only happen in a NUMA system per it's caller chain:
  stop_two_cpus() <- migrate_swap() <- task_numa_migrate() <- 
numa_migrate_preferred() <- [task_numa_placement()] <- task_numa_fault()

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 
0101 RBX: 0086 RCX: 0001
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 
0101 RSI: 0001 RDI: 881fff991ba8
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 
883ff2113c58 R08: 0101 R09: 883ff082e200
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 
2e04 R11: 2e04 R12: 881fff997c60
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 
881fff991ba8 R14:  R15: 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  
ES:  CR0: 80050033
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 
7f7caaa23020 CR3: 001f4674 CR4: 00160670
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094]  881fff991ba0 
883ff2113cb0 8111c600 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096]  881fff997c90 
881ff03dd400  883ff2113cc0
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105]  
[] queued_spin_lock_slowpath+0xb/0xf
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109]  
[] _raw_spin_lock_irqsave+0x37/0x40
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113]  
[] cpu_stop_queue_work+0x30/0x80
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484116]  
[] stop_one_cpu_nowait+0x30/0x40
  Nov 23 15:48:33

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Test-case (kmod-stopper.c)
-

$ sudo apt-get -y install gcc make libelf-dev linux-headers-$(uname -r)

$ touch Makefile # fake it, and use this make line:
$ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kmod-stopper.o modules

$ echo 9 | sudo tee /proc/sys/kernel/printk

$ sudo insmod kmod-stopper.ko



$ sudo rmmod kmod-stopper


** Attachment added: "kmod-stopper.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821259/+attachment/5248313/+files/kmod-stopper.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 
0101 RBX: 0086 RCX: 0001
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 
0101 RSI: 0001 RDI: 881fff991ba8
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 
883ff2113c58 R08: 0101 R09: 883ff082e200
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 
2e04 R11: 2e04 R12: 881fff997c60
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 
881fff991ba8 R14:  R15: 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  
ES:  CR0: 80050033
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 
7f7caaa23020 CR3: 001f4674 CR4: 00160670
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094]  881fff991ba0 
883ff2113cb0 8111c600 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096]  881fff997c90 
881ff03dd400 00

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Test-case on Xenial;

$ ls -1d /sys/devices/system/cpu/cpu[0-9]*
/sys/devices/system/cpu/cpu0
/sys/devices/system/cpu/cpu1


Original


$ uname -rv
4.4.0-144-generic #170-Ubuntu SMP Thu Mar 14 11:56:20 UTC 2019

$ sudo insmod kmod-stopper/kmod-stopper.ko
[   74.198379] mod_init() :: this cpu = 0x1, that cpu = 0x0
[   74.199613] mod_init() :: that_cpu_stopper_task = 88003d80e600, comm = 
migration/0
[   74.206194] kp2/stop_two_cpus() :: this cpu = 0x1, that cpu = 0x0
[   74.206196] do_nothing() :: this cpu = 0x0, that cpu = 0x1
[   74.206201] kp1/pick_next_task_fair() :: this cpu = 0x0, that cpu = 0x1
[   74.206203] kp1/pick_next_task_fair() :: before sleep (1000 msecs)
[   74.212759] kp2/stop_two_cpus() :: before sleep (500 msecs)
[   74.710138] kp2/stop_two_cpus() :: after  sleep (500 msecs)
[   75.198324] kp1/pick_next_task_fair() :: after  sleep (1000 msecs)
[   75.199814] kp1/pick_next_task_fair() :: stopping other cpu...


The test-case only failed 2 out of 50+ tests.


Patched:
---

$ uname -rv
4.4.0-144-generic #170+test20190320b1 SMP Wed Mar 20 18:35:06 UTC 2019

$ sudo insmod kmod-stopper/kmod-stopper.ko
[   85.958527] mod_init() :: this cpu = 0x1, that cpu = 0x0
[   85.965876] mod_init() :: that_cpu_stopper_task = 88003d80e600, comm = 
migration/0
[   85.993446] kp2/stop_two_cpus() :: this cpu = 0x1, that cpu = 0x0
[   85.993471] do_nothing() :: this cpu = 0x0, that cpu = 0x1
[   85.993477] kp1/pick_next_task_fair() :: this cpu = 0x0, that cpu = 0x1
[   85.993480] kp1/pick_next_task_fair() :: before sleep (1000 msecs)
[   86.019469] kp2/stop_two_cpus() :: before sleep (500 msecs)
[   86.521688] kp2/stop_two_cpus() :: after  sleep (500 msecs)
[   86.987662] kp1/pick_next_task_fair() :: after  sleep (1000 msecs)
[   86.989427] kp1/pick_next_task_fair() :: stopping other cpu...
[   86.991109] do_nothing() :: this cpu = 0x1, that cpu = 0x0
[   86.992615] do_nothing() :: this cpu = 0x1, that cpu = 0x0


It passes every time (50+ tests).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kerne

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Both xenial and bionic original/patched kernels
were tested with stress-ng scheduler class, and
no regressions were observed.

$ stress-ng --version
stress-ng, version 0.09.56 (gcc 8.3, x86_64 Linux 4.15.0-47-generic) 💻🔥

$ sudo stress-ng --class scheduler --sequential 0

$ uname -rv
4.4.0-144-generic #170-Ubuntu SMP Thu Mar 14 11:56:20 UTC 2019

$ uname -rv
4.4.0-144-generic #170+test20190320b1 SMP Wed Mar 20 18:35:06 UTC 2019

$ uname -rv
4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019

$ uname -rv
4.15.0-47-generic #50+test20190320b1 SMP Wed Mar 20 20:08:03 UTC 2019

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 
0101 RBX: 0086 RCX: 0001
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 
0101 RSI: 0001 RDI: 881fff991ba8
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 
883ff2113c58 R08: 0101 R09: 883ff082e200
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 
2e04 R11: 2e04 R12: 881fff997c60
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 
881fff991ba8 R14:  R15: 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  
ES:  CR0: 80050033
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 
7f7caaa23020 CR3: 001f4674 CR4: 00160670
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094]  881fff991ba0 
883ff2113cb0 8111c600 881fff997300
  Nov 23 

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
Since Bionic already has the fix commit applied,
the original kernel version doesn't hit the problem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 
0101 RBX: 0086 RCX: 0001
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 
0101 RSI: 0001 RDI: 881fff991ba8
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 
883ff2113c58 R08: 0101 R09: 883ff082e200
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 
2e04 R11: 2e04 R12: 881fff997c60
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 
881fff991ba8 R14:  R15: 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  
ES:  CR0: 80050033
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 
7f7caaa23020 CR3: 001f4674 CR4: 00160670
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484094]  881fff991ba0 
883ff2113cb0 8111c600 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484096]  881fff997c90 
881ff03dd400  883ff2113cc0
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484098] Call Trace:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484105]  
[] queued_spin_lock_slowpath+0xb/0xf
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484109]  
[] _raw_spin_lock_irqsave+0x37/0x40
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484113]  
[] cpu_stop_queue_work+0x30/0x80
  Nov 23

[Kernel-packages] [Bug 1821259] Re: Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

2019-03-21 Thread Mauricio Faria de Oliveira
[X][PATCH 0/4] LP#1821259 Fix for deadlock in cpu_stopper
https://lists.ubuntu.com/archives/kernel-team/2019-March/099427.html

[B][PATCH 0/2] Fix for LP#1821259 (pending patches for) Fix for deadlock in 
cpu_stopper
https://lists.ubuntu.com/archives/kernel-team/2019-March/099432.html

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** No longer affects: linux (Ubuntu)

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  [Impact]

   * This problem hard locks up 2 CPUs in a deadlock, and this
 soft locks up other CPUs as an effect; the system becomes
 unusable.

   * This is relatively rare / difficult to hit because it's a
 corner case in scheduling/load balancing that needs timing
 with CPU stopper code. And it needs SMP plus _NUMA_ system.
 (but it can be hit with synthetic test case attached in LP.)

   * Since SMP plus NUMA usually equals _servers_ it looks like
 a good idea to prevent this bug / hard lockups / rebooting.

   * The fix resolves the potential deadlock by removing one of
 the calls required to deadlock from under the locked code.

  [Test Case]

   * There's a synthetic test case to reproduce this problem
 (although without the stack traces - just a system hang)
 attached to this LP bug.

   * It uses kprobes/mdelay/cpu stopper calls to force the code
 to execute and force the timing/locking condition to occur.

   * $ sudo insmod kmod-stopper.ko

 Some dmesg logging occurs, and systems either hangs or not.
 See examples in comments.
 
  [Regression Potential] 

   * These are patches to the cpu stop_machine.c code, and they
 change a bit how it works;  however, there are no upstream
 fixes for these patches anymore and they are still the top
 of the 'git log --oneline -- kernel/stop_machine.c' output.

   * These patches have been verified with the synthetic test case
 and 'stress-ng --class scheduler --sequential 0' (no regressions)
 on guest with 2 CPUs and one physical system with 24 CPUs.

  [Other Info]
   
   * The patches are required on Xenial and later.
   * There are 4 patches for Xenial, and 2 patches pending for Bionic.
   * All patches are applied from Cosmic onwards.

  [Original Description]

  These 2 hard lockups happened all of a sudden in the logs, and many
  soft lockups occur after them as a fallout.

  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.477086] NMI watchdog: 
Watchdog detected hard LOCKUP on cpu 10
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.483800] Modules linked in: 
<...>
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484066] CPU: 10 PID: 58 
Comm: migration/10 Not tainted 4.4.0-116-generic #140~14.04.1-Ubuntu
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484068] Hardware name: HP 
ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 02/17/2017
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484070] task: 
883ff2a76200 ti: 883ff211 task.ti: 883ff211
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484071] RIP: 
0010:[]  [] 
native_queued_spin_lock_slowpath+0x160/0x170
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484079] RSP: 
:883ff2113c58  EFLAGS: 0002
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484080] RAX: 
0101 RBX: 0086 RCX: 0001
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484081] RDX: 
0101 RSI: 0001 RDI: 881fff991ba8
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484083] RBP: 
883ff2113c58 R08: 0101 R09: 883ff082e200
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484084] R10: 
2e04 R11: 2e04 R12: 881fff997c60
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484085] R13: 
881fff991ba8 R14:  R15: 881fff997300
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484087] FS:  
() GS:883fff00() knlGS:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484088] CS:  0010 DS:  
ES:  CR0: 80050033
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484090] CR2: 
7f7caaa23020 CR3: 001f4674 CR4: 00160670
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484091] Stack:
  Nov 23 15:48:33 SYSTEM_NAME kernel: [4603802.484092]  883ff2113c68 
811870eb 883ff2113c80 81819907
  Nov 23 15:48:33 SYSTEM_NAME kernel

[Kernel-packages] [Bug 1821395] [NEW] fscache: jobs might hang when fscache disk is full

2019-03-22 Thread Mauricio Faria de Oliveira
Public bug reported:

< NOTE: patches will be sent to kernel-team mailing list. >

[Impact]

 * fscache issue where jobs get hung when fscache disk is full.

 * trivial upstream fix; already applied in X/D, required in B/C:
   commit c5a94f434c82 ("fscache: fix race between enablement and
   dropping of object").

[Test Case]

 * Test kernel verified / regression-tested by reporter.

 * Apparently there's no simple test case, 
   but these are the conditions to hit the problem:

   1) The active dataset size is equal to the cache disk size. 
  The application reads the data over and over again.
   2) Disk is near full (90%+)
   3) cachefilesd in userspace is trying to cull the old objects
  while new objects are being looked up.
   4) new cachefiles are created and some fail with no disk space.
   5) race in dropping object state machine and 
  deferred lookup state machine causes the hang.
   6) HUNG in fscache_wait_for_deferred_lookup for
  clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

[Regression Potential]

 * Low; contained in fscache; no further fixes applied upstream.

 * This patch is applied in a stable tree (linux-4.4.y).

[Original Description]

An user reported an fscache issue where jobs get hung when the fscache
disk is full.

After investigation, it's been found to be an issue already reported/fixed 
upstream,
by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

This patch is required in Bionic and Cosmic, and it's applied in Xenial
(via stable) and Disco.

Apparently there's no simple test case, but these are the conditions to
hit the problem:

1) The active dataset size is equal to the cache disk size. 
   The application reads the data over and over again.
2) Disk is near full (90%+)
3) cachefilesd in userspace is trying to cull the old objects
   while new objects are being looked up.
4) new cachefiles are created and some fail with no disk space.
5) race in dropping object state machine and 
   deferred lookup state machine causes the hang.
6) HUNG in fscache_wait_for_deferred_lookup for
   clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  New

Bug description:
  < NOTE: patches will be sent to kernel-team mailing list. >

  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
 commit c5a94f434c82 ("fscache: fix race between enablement and
 dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case, 
 but these are the conditions to hit the problem:

 1) The active dataset size is equal to the cache disk size. 
The application reads the data over and over again.
 2) Disk is near full (90%+)
 3) cachefilesd in userspace is trying to cull the old objects
while new objects are being looked up.
 4) new cachefiles are created and some fail with no disk space.
 5) race in dropping object state machine and 
deferred lookup state machine causes the hang.
 6) HUNG in fscache_wait_for_deferred_lookup for
clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

  [Regression Potential]

   * Low; contained in fscache; no further fixes applied upstream.

   * This patch is applied in a stable tree (linux-4.4.y).

  [Original Description]

  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.

  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

  This patch is required in Bionic and Cosmic, and it's applied in
  Xenial (via stable) and Disco.

  Apparently there's no simple test case, but these are the conditions
  to hit the problem:

  1) The active dataset size is equal to the cache disk size. 
 The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
 while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
  5) race in dropping object state machine and 
 deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
 clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.

[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-03-22 Thread Mauricio Faria de Oliveira
** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: Incomplete => Invalid

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Cosmic)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed

Bug description:
  < NOTE: patches will be sent to kernel-team mailing list. >

  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
 commit c5a94f434c82 ("fscache: fix race between enablement and
 dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case, 
 but these are the conditions to hit the problem:

 1) The active dataset size is equal to the cache disk size. 
The application reads the data over and over again.
 2) Disk is near full (90%+)
 3) cachefilesd in userspace is trying to cull the old objects
while new objects are being looked up.
 4) new cachefiles are created and some fail with no disk space.
 5) race in dropping object state machine and 
deferred lookup state machine causes the hang.
 6) HUNG in fscache_wait_for_deferred_lookup for
clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

  [Regression Potential]

   * Low; contained in fscache; no further fixes applied upstream.

   * This patch is applied in a stable tree (linux-4.4.y).

  [Original Description]

  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.

  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").

  This patch is required in Bionic and Cosmic, and it's applied in
  Xenial (via stable) and Disco.

  Apparently there's no simple test case, but these are the conditions
  to hit the problem:

  1) The active dataset size is equal to the cache disk size. 
 The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
 while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
  5) race in dropping object state machine and 
 deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
 clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821395/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1821395] Re: fscache: jobs might hang when fscache disk is full

2019-03-22 Thread Mauricio Faria de Oliveira
[B/C][PATCH 0/1] Fix for LP#1821395 (fscache: jobs might hang when fscache disk 
is full)
https://lists.ubuntu.com/archives/kernel-team/2019-March/099448.html

** Description changed:

- < NOTE: patches will be sent to kernel-team mailing list. >
- 
  [Impact]
  
-  * fscache issue where jobs get hung when fscache disk is full.
+  * fscache issue where jobs get hung when fscache disk is full.
  
-  * trivial upstream fix; already applied in X/D, required in B/C:
-commit c5a94f434c82 ("fscache: fix race between enablement and
-dropping of object").
+  * trivial upstream fix; already applied in X/D, required in B/C:
+    commit c5a94f434c82 ("fscache: fix race between enablement and
+    dropping of object").
  
  [Test Case]
  
-  * Test kernel verified / regression-tested by reporter.
+  * Test kernel verified / regression-tested by reporter.
  
-  * Apparently there's no simple test case, 
-but these are the conditions to hit the problem:
+  * Apparently there's no simple test case,
+    but these are the conditions to hit the problem:
  
-1) The active dataset size is equal to the cache disk size. 
-   The application reads the data over and over again.
-2) Disk is near full (90%+)
-3) cachefilesd in userspace is trying to cull the old objects
-   while new objects are being looked up.
-4) new cachefiles are created and some fail with no disk space.
-5) race in dropping object state machine and 
-   deferred lookup state machine causes the hang.
-6) HUNG in fscache_wait_for_deferred_lookup for
-   clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.
+    1) The active dataset size is equal to the cache disk size.
+   The application reads the data over and over again.
+    2) Disk is near full (90%+)
+    3) cachefilesd in userspace is trying to cull the old objects
+   while new objects are being looked up.
+    4) new cachefiles are created and some fail with no disk space.
+    5) race in dropping object state machine and
+   deferred lookup state machine causes the hang.
+    6) HUNG in fscache_wait_for_deferred_lookup for
+   clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.
  
  [Regression Potential]
  
-  * Low; contained in fscache; no further fixes applied upstream.
+  * Low; contained in fscache; no further fixes applied upstream.
  
-  * This patch is applied in a stable tree (linux-4.4.y).
+  * This patch is applied in a stable tree (linux-4.4.y).
  
  [Original Description]
  
  An user reported an fscache issue where jobs get hung when the fscache
  disk is full.
  
  After investigation, it's been found to be an issue already reported/fixed 
upstream,
  by commit c5a94f434c82 ("fscache: fix race between enablement and dropping of 
object").
  
  This patch is required in Bionic and Cosmic, and it's applied in Xenial
  (via stable) and Disco.
  
  Apparently there's no simple test case, but these are the conditions to
  hit the problem:
  
- 1) The active dataset size is equal to the cache disk size. 
-The application reads the data over and over again.
+ 1) The active dataset size is equal to the cache disk size.
+    The application reads the data over and over again.
  2) Disk is near full (90%+)
  3) cachefilesd in userspace is trying to cull the old objects
-while new objects are being looked up.
+    while new objects are being looked up.
  4) new cachefiles are created and some fail with no disk space.
- 5) race in dropping object state machine and 
-deferred lookup state machine causes the hang.
+ 5) race in dropping object state machine and
+    deferred lookup state machine causes the hang.
  6) HUNG in fscache_wait_for_deferred_lookup for
-clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.
+    clear bit FSCACHE_COOKIE_LOOKING_UP cookie->flags.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1821395

Title:
  fscache: jobs might hang when fscache disk is full

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed

Bug description:
  [Impact]

   * fscache issue where jobs get hung when fscache disk is full.

   * trivial upstream fix; already applied in X/D, required in B/C:
     commit c5a94f434c82 ("fscache: fix race between enablement and
     dropping of object").

  [Test Case]

   * Test kernel verified / regression-tested by reporter.

   * Apparently there's no simple test case,
     but these are the conditions to hit the problem:

     1) The active dataset size is equal to the cache disk size.
    The application reads the data over and over again.
     2) Disk is near full (90%+)
     3) cachefilesd in userspace is trying to cull the old objects
    while new objects are being looked up.
     4) new cachefiles are created and some fail with no disk space.
     5) race in dr

[Kernel-packages] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-03-25 Thread Mauricio Faria de Oliveira
Updating bug tags to verification done.

As mentioned by users in this LP bug, the verification period of 5 days
is _usually_ not enough to reproduce this problem, however, we have some
datapoints that support the fix is good.

1) The fix has been first delivery in linux-azure, 3 weeks ago, and has
reportedly resolved the issue for @alanjcastonguay: the issue was
experienced within 4 days at the most, and hasn't happened for 2 weeks
in 8 nodes (which is statistically very positive; and it helps that the
fix is not specific to -azure).

2) One of the users who reported this in linux (-generic), has verified
a test kernel with this fix for weeks, based upon which the fix has been
submitted after linux-azure had it. The same user has verified -proposed
for about a week now, and it's looking good.

3) Users in this LP bug have been running the -proposed kernel in
multiple nodes for about a week now too, and haven't hit the issue yet.

On top of 1), with 2) and 3) combined, and the schedule for -proposed
verification, this seems to be a reasonable compromise between results
and test time.

cheers,
Mauricio

** Tags removed: verification-needed-bionic verification-needed-cosmic
** Tags added: verification-done-bionic verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Confirmed
Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in linux-azure source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Committed
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
  rcu.git/commit/?h=dev&id=1a05c0cd2fee2

[Kernel-packages] [Bug 1817628] Re: Regular D-state processes impacting LXD containers

2019-03-29 Thread Mauricio Faria de Oliveira
Marking X/B as verification done.

The user reports the issue occurs much less often now.
Apparently that environment hits some corner case or this may still be expected 
sometimes under memory pressure (i.e., one big shrinking operation acquired the 
lock and must finish).

Nonetheless, the fix reduced the frequency the problem occurs, so it
does address most of that and is beneficial on its own.

As discussed with @klebers on IRC, in this scenario we agreed to ship
the fix, and address the pending behavior (if an actual issue) with
additional fixes later.

cheers,
Mauricio

** Tags removed: verification-needed-bionic verification-needed-xenial
** Tags added: verification-done-bionic verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817628

Title:
  Regular D-state processes impacting LXD containers

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

   * Systems running under memory pressure may hit stalls in the
 order of seconds to minutes in systemd-logind and lxd mount
 operations (e.g., ZFS backend), which get stuck in D state.

   * The processes stuck in D state have a common stack trace,
 (cat /proc/PID/stack) all blocked in register_shrinker().

   * The fix checks in shrink_slab() (shrinkers are called under
 memory pressure) for contention/usage of the semaphore used
 by register_shrinker() and returns early in that case.

 This allows the register_shrinker() callers to unblock,
 and not stall until the shrink operation releases that lock.

  [Test Case]

   * In a system under memory pressure, specifically having the
 memory shrinkers being called often and taking time to run,
 perform mount operations (or other operations that acquire
 the shrinker_rwsem semaphore).

   * The user who reported the problem has verified the fix in
 systems that exhibted the problem often (sometimes daily),
 and tells it resolves the problem.

  [Regression Potential]

   * Low. The fix just returns early from slab memory shrinker
 if there's usage/contention for 'shrinker_rwsem'.

   * In some scenarios, this may cause the slab memory shrinker
 to require more invocations to actually finish and potentially
 release memory, but this seems minor since other shrinkers can
 release memory as well, and compared to the fact that this fix
 allows other applications to make progress / continue to run,
 which would otherwise be stalled.

  [Other Info]
   
   * This patch is already applied in Cosmic and later (v4.16+).
 It is needed only in Xenial and Bionic at this time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817628/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-28 Thread Mauricio Faria de Oliveira
** Patch added: "cosmic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267256/+files/cosmic_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Committed
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Fix Committed
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-28 Thread Mauricio Faria de Oliveira
** Patch removed: "bionic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265733/+files/bionic_partman-iscsi.debdiff

** Patch removed: "cosmic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265736/+files/cosmic_partman-iscsi.debdiff

** Patch removed: "disco_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265739/+files/disco_partman-iscsi.debdiff

** Patch removed: "eoan_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5265742/+files/eoan_partman-iscsi.debdiff

** Patch added: "bionic_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267255/+files/bionic_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Committed
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Fix Committed
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this b

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-28 Thread Mauricio Faria de Oliveira
** Patch added: "eoan_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267258/+files/eoan_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Committed
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Fix Committed
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-28 Thread Mauricio Faria de Oliveira
Hi Eric,

Nice catch. Sorry about that!

Sure, just attached the partman-iscsi debdiffs with the LP bug mentioned
in changelog.

Thanks,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Committed
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Fix Committed
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-05-28 Thread Mauricio Faria de Oliveira
** Patch added: "disco_partman-iscsi.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1817321/+attachment/5267257/+files/disco_partman-iscsi.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Committed
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Confirmed
Status in hw-detect source package in Bionic:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Confirmed
Status in debian-installer source package in Cosmic:
  Confirmed
Status in hw-detect source package in Cosmic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Confirmed
Status in debian-installer source package in Disco:
  Confirmed
Status in hw-detect source package in Disco:
  Confirmed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Confirmed
Status in debian-installer source package in Eoan:
  Fix Committed
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-03 Thread Mauricio Faria de Oliveira
Dan, thanks for the review/fixes/upload!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  In Progress
Status in hw-detect source package in Bionic:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  In Progress
Status in debian-installer source package in Cosmic:
  In Progress
Status in hw-detect source package in Cosmic:
  In Progress
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  In Progress
Status in debian-installer source package in Disco:
  In Progress
Status in hw-detect source package in Disco:
  In Progress
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  In Progress
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an user with system/firmware that supports iBFT for iSCSI.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1817321/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-05 Thread Mauricio Faria de Oliveira
verification done for disco/hw-detect.

disk-detect found the iscsi target/lun configured in ibft.

$ wget http://boot.ipxe.org/ipxe.lkrn
$ wget 
http://archive.ubuntu.com/ubuntu/dists/disco/main/installer-amd64/20101020ubuntu570/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz}

$ python3 -m http.server &

$ qemu-system-x86_64 \
  -nodefaults \
  -enable-kvm \
  -smp 2 -m 4096 \
  -serial stdio \
  -vga virtio \
  -display vnc=0.0.0.0:1 \
  -netdev bridge,id=bridge-world,br=virbr0 \
  -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \
  -device 
virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \
  -device 
virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \
  -kernel ipxe.lkrn


workstation $ vncviewer buneary.segmaas.1ss:1



iPXE>

iPXE> ifopen net1
iPXE> set net1/ip 10.0.0.2
iPXE> set net1/netmask 255.255.255.0

iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1
Registered SAN device 0x80

iPXE> ifopen net0
iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz 
apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0
iPXE> initrd http://192.168.122.1:8000/initrd.gz
iPXE> boot 

...


  │ Select disk to partition:   │
  │ │
  │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│


...

~ # grep 'retrieving disk-detect' /var/log/syslog
Jun  5 23:36:13 anna[1582]: DEBUG: retrieving disk-detect 1.117ubuntu6.19.04.1

~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog
Jun  5 23:38:55 disk-detect: insmod 
/lib/modules/5.0.0-13-generic/kernel/drivers/firmware/iscsi_ibft.ko
Jun  5 23:38:55 disk-detect: # BEGIN RECORD 2.0-874
Jun  5 23:38:55 disk-detect: iface.initiatorname = 
iqn.2010-04.org.ipxe:----
Jun  5 23:38:55 disk-detect: iface.hwaddress = 52:54:00:00:00:02
Jun  5 23:38:55 disk-detect: iface.bootproto = STATIC
Jun  5 23:38:55 disk-detect: iface.ipaddress = 10.0.0.2
Jun  5 23:38:55 disk-detect: iface.subnet_mask = 255.255.255.0
Jun  5 23:38:55 disk-detect: iface.primary_dns = 192.168.122.1
Jun  5 23:38:55 disk-detect: iface.vlan_id = 0
Jun  5 23:38:55 disk-detect: iface.net_ifacename = ens4
Jun  5 23:38:55 disk-detect: node.name = iqn.2019-06.com.example:target1
Jun  5 23:38:55 disk-detect: node.conn[0].address = 10.0.0.1
Jun  5 23:38:55 disk-detect: node.conn[0].port = 3260
Jun  5 23:38:55 disk-detect: node.boot_lun = 0100
Jun  5 23:38:55 disk-detect: # END RECORD
Jun  5 23:38:55 kernel: [  207.271009] iBFT detected.
Jun  5 23:38:55 disk-detect: Setting up software interface ens4
Jun  5 23:38:55 disk-detect: iscsistart: can not connect to iSCSI daemon (111)!
Jun  5 23:38:55 kernel: [  207.286051] Loading iSCSI transport class v2.0-870.
Jun  5 23:38:55 disk-detect: iscsistart: version 2.0-874
Jun  5 23:38:55 disk-detect:
Jun  5 23:38:56 kernel: [  208.291508] iscsi: registered transport (tcp)
Jun  5 23:38:56 kernel: [  208.294107] scsi host2: iSCSI Initiator over TCP/IP
Jun  5 23:38:56 disk-detect: iscsistart: Connection1:0 to [target: 
iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: d
efault] is operational now
Jun  5 23:38:56 kernel: [  208.302594] scsi 2:0:0:0: RAID  IET  
Controller   0001 PQ: 0 ANSI: 5
Jun  5 23:38:56 kernel: [  208.306723] scsi 2:0:0:0: Attached scsi generic sg0 
type 12
Jun  5 23:38:56 kernel: [  208.310611] scsi 2:0:0:1: Direct-Access IET  
VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jun  5 23:38:56 kernel: [  208.314013] sd 2:0:0:1: Power-on or device reset 
occurred
Jun  5 23:38:56 kernel: [  208.315260] sd 2:0:0:1: Attached scsi generic sg1 
type 0
Jun  5 23:38:56 disk-detect: iscsistart: Logging into 
iqn.2019-06.com.example:target1 10.0.0.1:3260,1
Jun  5 23:38:56 kernel: [  208.317323] sd 2:0:0:1: [sda] 16777216 512-byte 
logical blocks: (8.59 GB/8.00 GiB)
Jun  5 23:38:56 kernel: [  208.317327] sd 2:0:0:1: [sda] 4096-byte physical 
blocks
Jun  5 23:38:56 kernel: [  208.317962] sd 2:0:0:1: [sda] Write Protect is off
Jun  5 23:38:56 kernel: [  208.317965] sd 2:0:0:1: [sda] Mode Sense: 69 00 10 08
aJun  5 23:38:56 kernel: [  208.319113] sd 2:0:0:1: [sda] Write cache: enabled, 
read cache: enabled, supports DPO and FUA
Jun  5 23:38:56 kernel: [  208.334152] sd 2:0:0:1: [sda] Attached SCSI disk
Jun  5 23:38:56 disk-detect: iBFT disk detection finished.

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux pack

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-05 Thread Mauricio Faria de Oliveira
verification done for cosmic/hw-detect.

disk-detect found the iscsi target/lun configured in ibft.

cosmic currently needed the workaround to download/install
the iscsi_ibft.ko module, because the d-i changes are not
yet in (update to kernel version to includes it in scsi-modules.udeb).

$ wget http://boot.ipxe.org/ipxe.lkrn
$ wget 
http://archive.ubuntu.com/ubuntu/dists/cosmic-updates/main/installer-amd64/20101020ubuntu557.1/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz}

$ python3 -m http.server &

$ qemu-system-x86_64 \
  -nodefaults \
  -enable-kvm \
  -smp 2 -m 4096 \
  -serial stdio \
  -vga virtio \
  -display vnc=0.0.0.0:1 \
  -netdev bridge,id=bridge-world,br=virbr0 \
  -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \
  -device 
virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \
  -device 
virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \
  -kernel ipxe.lkrn


workstation $ vncviewer buneary.segmaas.1ss:1



iPXE>

iPXE> ifopen net1
iPXE> set net1/ip 10.0.0.2
iPXE> set net1/netmask 255.255.255.0

iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1
Registered SAN device 0x80

iPXE> ifopen net0
iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz 
apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0
iPXE> initrd http://192.168.122.1:8000/initrd.gz
iPXE> boot 

...

~ # ls /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko
ls: /lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko: No 
such file or directory

~ # cd /tmp
/tmp # wget 
http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-modules-4.18.0-10-generic_4.18.0-10.11_amd64.deb
/tmp # ar x linux-modules-4.18.0-10-generic_4.18.0-10.11_amd64.deb 
/tmp # xzcat data.tar.xz | tar x
/tmp # mkdir /lib/modules/4.18.0-10-generic/kernel/drivers/firmware
/tmp # cp lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko 
/lib/modules/4.18.0-10-generic/kernel/drivers/firmware/
/tmp # exit

...


  │ Select disk to partition:   │
  │ │
  │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│


...
~ # grep 'retrieving disk-detect' /var/log/syslog
Jun  6 00:05:52 anna[1521]: DEBUG: retrieving disk-detect 1.117ubuntu6.18.10.1

~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog
Jun  6 00:10:59 disk-detect: insmod 
/lib/modules/4.18.0-10-generic/kernel/drivers/firmware/iscsi_ibft.ko
Jun  6 00:10:59 kernel: [  364.281283] iBFT detected.
Jun  6 00:10:59 disk-detect: # BEGIN RECORD 2.0-874
Jun  6 00:10:59 disk-detect: iface.initiatorname = 
iqn.2010-04.org.ipxe:----
Jun  6 00:10:59 disk-detect: iface.hwaddress = 52:54:00:00:00:02
Jun  6 00:10:59 disk-detect: iface.bootproto = STATIC
Jun  6 00:10:59 disk-detect: iface.ipaddress = 10.0.0.2
Jun  6 00:10:59 disk-detect: iface.subnet_mask = 255.255.255.0
Jun  6 00:10:59 disk-detect: iface.primary_dns = 192.168.122.1
Jun  6 00:10:59 disk-detect: iface.vlan_id = 0
Jun  6 00:10:59 disk-detect: iface.net_ifacename = ens4
Jun  6 00:10:59 disk-detect: node.name = iqn.2019-06.com.example:target1
Jun  6 00:10:59 disk-detect: node.conn[0].address = 10.0.0.1
Jun  6 00:10:59 disk-detect: node.conn[0].port = 3260
Jun  6 00:10:59 disk-detect: node.boot_lun = 0100
Jun  6 00:10:59 disk-detect: # END RECORD
Jun  6 00:10:59 disk-detect: Setting up software interface ens4
Jun  6 00:10:59 disk-detect: iscsistart: version 2.0-874
Jun  6 00:10:59 kernel: [  364.296265] Loading iSCSI transport class v2.0-870.
Jun  6 00:10:59 kernel: [  364.304322] iscsi: registered transport (tcp)
Jun  6 00:10:59 kernel: [  364.305731] scsi host2: iSCSI Initiator over TCP/IP
Jun  6 00:10:59 disk-detect: iscsistart: Connection1:0 to [target: 
iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: d
efault] is operational now
Jun  6 00:10:59 kernel: [  364.310276] scsi 2:0:0:0: RAID  IET  
Controller   0001 PQ: 0 ANSI: 5
Jun  6 00:10:59 kernel: [  364.311404] scsi 2:0:0:0: Attached scsi generic sg0 
type 12
Jun  6 00:10:59 disk-detect: iscsistart: Logging into 
iqn.2019-06.com.example:target1 10.0.0.1:3260,1
Jun  6 00:10:59 kernel: [  364.312491] scsi 2:0:0:1: Direct-Access IET  
VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jun  6 00:10:59 kernel: [  364.313947] sd 2:0:0:1: Attached scsi generic sg1 
type 0
Jun  6 00:10:59 kernel: [  364.314374] sd 2:0:0:1: Power-on or device reset 
occurred
Jun  6 00:10:59 kernel: [  364.316862] sd 2:0:0:1: [sda] 16777216 512-byte 
logical blocks: (8.59 GB/8.00 GiB)
Jun  6 00:10:59 kernel: [  364.316864] sd 2:0:0:1: [sda] 4096-byte physical 
blocks
Jun  6 00:10:59 kernel: [  364.317102] sd 2:0:0:1: [sda] Write Protect is off
Jun  6 00:10:59 kernel: [  364.317104] sd 2:0:0:

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-05 Thread Mauricio Faria de Oliveira
verification done for bionic/hw-detect.

disk-detect found the iscsi target/lun configured in ibft.

bionic currently needed the workaround to download/install
the iscsi_ibft.ko module, because the d-i changes are not
yet in (update to kernel version that includes it in scsi-modules.udeb).

$ wget http://boot.ipxe.org/ipxe.lkrn
$ wget 
http://archive.ubuntu.com/ubuntu/dists/bionic-updates/main/installer-amd64/20101020ubuntu543.7/images/netboot/ubuntu-installer/amd64/{linux,initrd.gz}

$ python3 -m http.server &

$ qemu-system-x86_64 \
  -nodefaults \
  -enable-kvm \
  -smp 2 -m 4096 \
  -serial stdio \
  -vga virtio \
  -display vnc=0.0.0.0:1 \
  -netdev bridge,id=bridge-world,br=virbr0 \
  -netdev bridge,id=bridge-iscsi,br=virbr-iscsi \
  -device 
virtio-net-pci,netdev=bridge-world,id=nic-world,mac=52:54:00:00:00:01 \
  -device 
virtio-net-pci,netdev=bridge-iscsi,id=nic-iscsi,mac=52:54:00:00:00:02 \
  -kernel ipxe.lkrn


workstation $ vncviewer buneary.segmaas.1ss:1



iPXE>

iPXE> ifopen net1
iPXE> set net1/ip 10.0.0.2
iPXE> set net1/netmask 255.255.255.0

iPXE> sanhook iscsi:10.0.0.1:::1:iqn.2019-06.com.example:target1
Registered SAN device 0x80

iPXE> ifopen net0
iPXE> kernel http://192.168.122.1:8000/linux initrd=initrd.gz 
apt-setup/proposed=true disk-detect/ibft/enable=true --- console=ttyS0
iPXE> initrd http://192.168.122.1:8000/initrd.gz
iPXE> boot 

...

~ # ls /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko
ls: /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko: No 
such file or directory


~ # cd /tmp
/tmp # wget 
http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux-modules-4.15.0-45-generic_4.15.0-45.48_amd64.deb
/tmp # ar x linux-modules-4.15.0-45-generic_4.15.0-45.48_amd64.deb 
/tmp # xzcat data.tar.xz | tar x
/tmp # mkdir /lib/modules/4.15.0-45-generic/kernel/drivers/firmware/
/tmp # cp lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko /l
ib/modules/4.15.0-45-generic/kernel/drivers/firmware/

/tmp # exit

...


  │ Select disk to partition:   │
  │ │
  │SCSI3 (0,0,1) (sda) - 8.6 GB IET VIRTUAL-DISK│


...

~ #  grep 'retrieving disk-detect' /var/log/syslog
Jun  6 01:01:18 anna[1610]: DEBUG: retrieving disk-detect 1.117ubuntu6.18.04.1

~ # sed -n '/scsi_ibft.ko/,/iBFT disk detection finished/p' /var/log/syslog
Jun  6 01:05:17 disk-detect: insmod 
/lib/modules/4.15.0-45-generic/kernel/drivers/firmware/iscsi_ibft.ko
Jun  6 01:05:17 kernel: [  293.676205] iBFT detected.
Jun  6 01:05:17 disk-detect: # BEGIN RECORD 2.0-874
Jun  6 01:05:17 disk-detect: iface.initiatorname = 
iqn.2010-04.org.ipxe:----
Jun  6 01:05:17 disk-detect: iface.hwaddress = 52:54:00:00:00:02
Jun  6 01:05:17 disk-detect: iface.bootproto = STATIC
Jun  6 01:05:17 disk-detect: iface.ipaddress = 10.0.0.2
Jun  6 01:05:17 disk-detect: iface.subnet_mask = 255.255.255.0
Jun  6 01:05:17 disk-detect: iface.primary_dns = 192.168.122.1
Jun  6 01:05:17 disk-detect: iface.vlan_id = 0
Jun  6 01:05:17 disk-detect: iface.net_ifacename = ens4
Jun  6 01:05:17 disk-detect: node.name = iqn.2019-06.com.example:target1
Jun  6 01:05:17 disk-detect: node.conn[0].address = 10.0.0.1
Jun  6 01:05:17 disk-detect: node.conn[0].port = 3260
Jun  6 01:05:17 disk-detect: node.boot_lun = 0100
Jun  6 01:05:17 disk-detect: # END RECORD
Jun  6 01:05:17 disk-detect: Setting up software interface ens4
Jun  6 01:05:17 disk-detect: iscsistart: version 2.0-874
Jun  6 01:05:17 kernel: [  293.694700] Loading iSCSI transport class v2.0-870.
Jun  6 01:05:17 kernel: [  293.702842] iscsi: registered transport (tcp)
Jun  6 01:05:17 disk-detect: iscsistart:
Jun  6 01:05:17 disk-detect: Connection1:0 to [target: 
iqn.2019-06.com.example:target1, portal: 10.0.0.1,3260] through [iface: 
default] is o
perational now
Jun  6 01:05:17 disk-detect:
Jun  6 01:05:17 kernel: [  293.704410] scsi host2: iSCSI Initiator over TCP/IP
Jun  6 01:05:17 kernel: [  293.709519] scsi 2:0:0:0: RAID  IET  
Controller   0001 PQ: 0 ANSI: 5
Jun  6 01:05:17 kernel: [  293.711428] scsi 2:0:0:0: Attached scsi generic sg0 
type 12
Jun  6 01:05:17 disk-detect: iscsistart: Logging into 
iqn.2019-06.com.example:target1 10.0.0.1:3260,1
Jun  6 01:05:17 kernel: [  293.713414] scsi 2:0:0:1: Direct-Access IET  
VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Jun  6 01:05:17 kernel: [  293.714350] sd 2:0:0:1: Attached scsi generic sg1 
type 0
Jun  6 01:05:17 kernel: [  293.714518] sd 2:0:0:1: Power-on or device reset 
occurred
Jun  6 01:05:17 kernel: [  293.716821] sd 2:0:0:1: [sda] 16777216 512-byte 
logical blocks: (8.59 GB/8.00 GiB)
Jun  6 01:05:17 kernel: [  293.716823] sd 2:0:0:1: [sda] 4096-byte physical 
blocks
Jun  6 01:05:17 kernel: [  293.717069] sd 2:0:0:1: [sda] Wri

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-06 Thread Mauricio Faria de Oliveira
verification done for disco/partman-iscsi.

Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs
with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address.

~ # grep 'retrieving partman-iscsi' /var/log/syslog
Jun  6 14:04:03 anna[1582]: DEBUG: retrieving partman-iscsi 
40ubuntu3.19.04.1

With 'partman-iscsi/iscsi_auto=true':

~ # debconf-get partman-iscsi/iscsi_auto
true

~ # cat /target/etc/iscsi/iscsi.initramfs 
ISCSI_AUTO=true

With 'partman-iscsi/iscsi_auto=false':

~ # debconf-get partman-iscsi/iscsi_auto
false

~ # cat /target/etc/iscsi/iscsi.initramfs
HWADDR="52:54:00:00:00:02"
ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1"
ISCSI_TARGET_IP="10.0.0.1"
ISCSI_TARGET_PORT="3260"
ISCSI_TARGET_GROUP="1"

~ # ip addr list
...
2: ens3: ...
link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3
...
3: ens4: ...
link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4
...

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  In Progress
Status in hw-detect source package in Bionic:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Fix Committed
Status in debian-installer source package in Cosmic:
  In Progress
Status in hw-detect source package in Cosmic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Fix Committed
Status in debian-installer source package in Disco:
  In Progress
Status in hw-detect source package in Disco:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Fix Committed
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
     

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-06 Thread Mauricio Faria de Oliveira
verification done for cosmic/partman-iscsi.

Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs
with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address.

~ # grep 'retrieving partman-iscsi' /var/log/syslog
Jun  6 14:20:51 anna[1521]: DEBUG: retrieving partman-iscsi 
40ubuntu3.18.10.1

With 'partman-iscsi/iscsi_auto=true':

~ # debconf-get partman-iscsi/iscsi_auto
true

~ # cat /target/etc/iscsi/iscsi.initramfs 
ISCSI_AUTO=true

With 'partman-iscsi/iscsi_auto=false':

~ # debconf-get partman-iscsi/iscsi_auto
false

~ # cat /target/etc/iscsi/iscsi.initramfs
HWADDR="52:54:00:00:00:02"
ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1"
ISCSI_TARGET_IP="10.0.0.1"
ISCSI_TARGET_PORT="3260"
ISCSI_TARGET_GROUP="1"

~ # ip addr list
...
2: ens3: ...
link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3
...
3: ens4: ...
link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4
...

** Tags removed: verification-needed-cosmic
** Tags added: verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  In Progress
Status in hw-detect source package in Bionic:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Fix Committed
Status in debian-installer source package in Cosmic:
  In Progress
Status in hw-detect source package in Cosmic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Fix Committed
Status in debian-installer source package in Disco:
  In Progress
Status in hw-detect source package in Disco:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Fix Committed
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-06 Thread Mauricio Faria de Oliveira
verification done for bionic/partman-iscsi.

Use of partman-iscsi/iscsi_auto correctly writes /etc/iscsi/iscsi.initramfs
with either ISCSI_AUTO=true or iSCSI LUN details with the right MAC address.

~ # grep 'retrieving partman-iscsi' /var/log/syslog 
Jun  6 15:01:20 anna[1605]: DEBUG: retrieving partman-iscsi 
40ubuntu3.18.04.1

With 'partman-iscsi/iscsi_auto=true':

~ # debconf-get partman-iscsi/iscsi_auto
true

~ # cat /target/etc/iscsi/iscsi.initramfs 
ISCSI_AUTO=true

With 'partman-iscsi/iscsi_auto=false':

~ # debconf-get partman-iscsi/iscsi_auto
false

~ # cat /target/etc/iscsi/iscsi.initramfs
HWADDR="52:54:00:00:00:02"
ISCSI_TARGET_NAME="iqn.2019-06.com.example:target1"
ISCSI_TARGET_IP="10.0.0.1"
ISCSI_TARGET_PORT="3260"
ISCSI_TARGET_GROUP="1"

~ # ip addr list
...
2: ens3: ...
link/ether 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.27/24 brd 192.168.122.255 scope global ens3
...
3: ens4: ...
link/ether 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 brd 10.0.0.255 scope global ens4
...

** Tags removed: verification-needed verification-needed-bionic
** Tags added: verification-done verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  In Progress
Status in hw-detect source package in Bionic:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Fix Committed
Status in debian-installer source package in Cosmic:
  In Progress
Status in hw-detect source package in Cosmic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Fix Committed
Status in debian-installer source package in Disco:
  In Progress
Status in hw-detect source package in Disco:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Fix Committed
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enab

[Kernel-packages] [Bug 1817321] Re: installer does not support iSCSI iBFT

2019-06-17 Thread Mauricio Faria de Oliveira
Verified debian-installer in {disco,cosmic,bionic}-proposed.

The verification has been done with the netboot image files
for disco/cosmic/bionic (and hwe-netboot for bionic),
using regular and lvm partitioning,
on VMs for the architectures amd64/i386/arm64/ppc64el
and baremetal for amd64.

All tests successfully installed and booted,
and have been checked for right release name,
partitioning method, installed kernel version,
installer's kernel version and kernel messages
(no errors, no warnings, weird messages, etc.)
i.e.,

$ lsb_release -cs
$ mount | grep -w /
$ uname -rvm
$ sudo grep 'Linux version' /var/log/installer/syslog
$ sudo grep kernel: /var/log/installer/syslog


** Tags removed: verification-needed verification-needed-bionic 
verification-needed-cosmic verification-needed-disco
** Tags added: verification-done verification-done-bionic 
verification-done-cosmic verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1817321

Title:
  installer does not support iSCSI iBFT

Status in debian-installer package in Ubuntu:
  Fix Released
Status in hw-detect package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in partman-iscsi package in Ubuntu:
  Fix Released
Status in debian-installer source package in Bionic:
  Fix Committed
Status in hw-detect source package in Bionic:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in partman-iscsi source package in Bionic:
  Fix Released
Status in debian-installer source package in Cosmic:
  Fix Committed
Status in hw-detect source package in Cosmic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in partman-iscsi source package in Cosmic:
  Fix Released
Status in debian-installer source package in Disco:
  Fix Committed
Status in hw-detect source package in Disco:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in partman-iscsi source package in Disco:
  Fix Released
Status in debian-installer source package in Eoan:
  Fix Released
Status in hw-detect source package in Eoan:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in partman-iscsi source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * It's not possible to access iBFT (iSCSI Boot Firmware Table) information
     (settings for network interface, initiator, and target) in the installer
     because the 'iscsi_ibft' module is not present in udeb packages.

   * Even if it was, the installer does not handle iBFT information at all,
     thus any settings are ignored, and iSCSI-related configuration has to
     be done manually or with workarounds.

   * This impacts user-experience and automatic installation on systems and
     deployments which actually do provide the iBFT feature and information,
     but cannot use it practically.

   * With proper iBFT support in the installer (kernel module in udeb package
     and automatic iSCSI-related configuration) users will be able to rely on
     iBFT to install/deploy Ubuntu on their servers and datacenters.

   * These fixes add the 'iscsi_ibft' kernel module in the scsi-modules udeb,
     and configure network/iSCSI according to iBFT information in disk-detect.

     This is done in disk-detect so that the iSCSI LUNs are detected as disks
     (useful in case of no other disks in the system so the installer doesn't
     complain nor wait too long) and that any partman-related preseed options
     are not required and may be still available for the user.

  [Test Case]

   * linux package / kernel module in udeb:

     $ dpkg-deb -c scsi-modules_*.udeb | grep iscsi_ibft.ko

     Check the module loads in the installer environment.
     See comment with example for disco.

   * d-i/hw-detect/partman-iscsi package:
     See comments 11, 12, 13.

  [Regression Potential]

   * linux package: low, the kernel module is not loaded by default,
     and only checks whether iBFT information is present in firmware,
     then exposes that in sysfs in read-only mode.

   * d-i/hw-detect/partman-iscsi:
     - d-i: kernel version update to include iscsi_ibft module,
    based on kernel released to -updates plus one week
    monitoring bug reports -- it should be OK.
    Tested on amd64/i386/arm64/ppc64el on QEMU, plus amd64
    on baremetal -- see comment 11.
     - hw-detect: low, the changes are enabled by a preseed option.
  see comment 12.
     - partman-iscsi: low, simple changes, plus one fix that has
  been tested in detail, and falls back to
  previous behavior if it fails.
  see comment 13.

  [Other Info]

   * This has been verified both by the developer with a simple iSCSI
     iBFT environment (2 VMs: iSCSI target & initiator with UEFI+iPXE)
     and by an u

[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-26 Thread Mauricio Faria de Oliveira
** Attachment added: "kprobe-test.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5259304/+files/kprobe-test.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeout and higher console logging level
   to notice the deadlocked tasks sooner, and watch progress:
 $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
 $ echo 9 | sudo tee /proc/sys/kernel/printk 

 - Load module / Run userspace part (logging to kernel log) in XFS:
 $ sudo insmod kprobe-test.ko
 $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test 
>/dev/kmsg'
 $ sudo rmmod kprobe-test

 You may need to ctrl-z with the original kernel as 'test' doesn't
  finish.

 - Check kernel log or watch the system console:
 $ dmesg

 Check threads in D state.
 $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

  
  [Regression Potential] 

   * The patch is small but changes core writeback infrastructure,
 so there's a chance this may _affect_ some or other behavior
 that has not been validated with our regression testing; not
 exactly _break_ it.  Please note our regression testing.

   * This has been verified with 'xfstests' (not only for XFS fs,
 despite its original name), used by major Linux filesystems
 for regression testing during development. It's been tested
 on systems with 24 and 4 CPUs (to exercise differences in
 scalability, parallelism, and workload) and XFS and ext4
 (reporter's environment + Ubuntu's default).
 No regressions were observed (the set of failed tests is
 the same in each system and tests failed in the same way).
 
   * This has also been verified with 'iozone' for write intensive
 tests, to exercise the writeback mechanism and no errors were
 observed.

   * The reporter has been running the test kernel with the patch
 for weeks and has not observed any other issues/regressions.

  [Other Info]
   
   * This is only required in Cosmic (for the Bionic HWE kernel),
 and is already applied in Disco.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1824827] Re: tasks doing write()/fsync() hit deadlock in write_cache_pages()

2019-04-26 Thread Mauricio Faria de Oliveira
Verification successful on Cosmic.

Updated the test-case kernel part (attached), and repeated it 20+ times,
without any process hanging.

In all cases, the new function call into write_cache_pages() is observed
in thread 0, between page index 2 and page index 1.

[  150.914872] mod_init():161 :: hello
[  150.917828] mod_init():207 :: kernel version: prop/-19/cosmic

[  150.950322] Program running, TID = 1429
[  150.951566] kp1_pre_handler():073 :: state 0  :: pid =   1429, mapping = 
0x8abcba385570, comm = 'test'
[  150.954205] kp1_pre_handler():082 :: state 0 -> 1 :: pid =   1429, mapping = 
0x8abcba385570
[  150.956518] kp2_pre_handler():122 :: state 1  :: pid =   1429, page 
index = 1
[  150.958410] kp3_pre_handler():147 :: state 1  :: pid =   1429, page 
index = 1, calling writepage()
[  150.961047] kp2_pre_handler():122 :: state 1  :: pid =   1429, page 
index = 2
[  150.964788] kp3_pre_handler():147 :: state 1  :: pid =   1429, page 
index = 2, calling writepage()

[  151.973660] Thread 0 running, TID = 1430!
[  151.977071] kp1_pre_handler():073 :: state 1  :: pid =  7, mapping = 
0x8abcba385570, comm = 'kworker/u8:0'
[  151.984836] kp1_pre_handler():104 :: state 1 -> 2 :: pid =  7, mapping = 
0x8abcba385570, comm ('kworker/u8:0') is kworker AND wbc->range_cyclic 
(0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[  152.017726] kp2_pre_handler():122 :: state 2  :: pid =  7, page 
index = 2
[  152.027193] kp3_pre_handler():147 :: state 2  :: pid =  7, page 
index = 2, calling writepage()
[  152.038466] kp1_pre_handler():073 :: state 2  :: pid =  7, mapping = 
0x8abcba385570, comm = 'kworker/u8:0'
[  152.048736] kp2_pre_handler():122 :: state 2  :: pid =  7, page 
index = 1
[  152.056642] kp2_pre_handler():126 :: state 2 -> 3 :: pid =  7, page 
index = 1, spin 5 seconds before lock_page()...

[  152.973731] Thread 1 running, TID = 1431!
[  152.974943] kp1_pre_handler():073 :: state 3  :: pid =   1431, mapping = 
0x8abcba385570, comm = 'test'
[  152.977489] kp2_pre_handler():122 :: state 3  :: pid =   1431, page 
index = 1
[  152.979140] kp3_pre_handler():147 :: state 3  :: pid =   1431, page 
index = 1, calling writepage()
[  152.981928] kp2_pre_handler():122 :: state 3  :: pid =   1431, page 
index = 2

[  153.973895] Thread 2 running, TID = 1432!
[  153.975160] kp1_pre_handler():073 :: state 3  :: pid =   1432, mapping = 
0x8abcba385570, comm = 'test'
[  153.978573] kp2_pre_handler():122 :: state 3  :: pid =   1432, page 
index = 1
[  157.033588] kp2_pre_handler():130 :: state 3 -> 4 :: pid =  7, page 
index = 1, spun 5 seconds before lock_page().
[  157.036151] kp3_pre_handler():147 :: state 4  :: pid =   1431, page 
index = 2, calling writepage()
[  157.038804] kp3_pre_handler():147 :: state 4  :: pid =   1432, page 
index = 1, calling writepage()
[  157.041212] kp2_pre_handler():122 :: state 4  :: pid =   1432, page 
index = 2
[  157.058880] mod_exit():230 :: bye


** Tags removed: verification-needed-cosmic
** Tags added: verification-done-cosmic

** Attachment removed: "kprobe-test.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824827/+attachment/5255994/+files/kprobe-test.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824827

Title:
  tasks doing write()/fsync() hit deadlock in write_cache_pages()

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Invalid

Bug description:
  [Impact]

   * Tasks of a multi-threaded workload doing write() and fsync()
 might deadlock in write_cache_pages(), preventing progress.

   * The fix addresses a corner case in write_cache_pages() on
 the range_cyclic implementation which allows the deadlock.

   * Patch:
 - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
   ("mm/page-writeback.c: fix range_cyclic writeback vs
   writepages deadlock"), present in v4.20-rc1~92^2~19.

  [Test Case]

   * This issue originally has been hit by the 'perforce' (p4d)
 tool in a XFS filesystem, but it's difficult/rare to occur.

   * We've written an userspace + kernel module (kprobes-based)
 to reproduce this problem and verify the test kernel/patch.

   * The kprobes are strictly tied to particular kernel versions
 because of the assembly instruction offsets.  We'll provide
 updated versions for -updates and -proposed for verification.

   * Steps 
 (see output examples in comments):

 - Userspace part:
 $ gcc -o test test.c -pthread

 - Kernel part:
 $ touch Makefile 
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
 $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o 
modules 

 - Shorter hung task timeou

[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device

2019-07-29 Thread Mauricio Faria de Oliveira
Tested with disco-proposed + patch (problem does not happen)
---

# uname -rv
5.0.0-22-generic #23+test20190725b1 SMP Mon Jul 29 14:43:55 -03 2019

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
[   69.567775] bcache: register_bdev() registered backing device loop0/null 
2>&1 
[   69.577141] bcache: run_cache_set() invalidating existing data
[   69.591172] bcache: register_cache() registered cache device loop1
[   69.591517] bcache: register_bcache() error /dev/loop0: device already 
registered (emitting change event)
[   73.570620] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 
0ed05289-ed85-40da-bcf4-3991f2e18e03
#

(no warning message)

# reboot
# # comment last line in script.

# ./setup-bcache-wb_percent-before-attach.sh >/dev/null 2>&1
#

[   40.045968] bcache: register_bdev() registered backing device loop0ev/null 
2>&
[   40.050914] bcache: run_cache_set() invalidating existing data
[   40.060793] bcache: register_cache() registered cache device loop1
[   40.068735] bcache: register_bcache() error /dev/loop1: device already 
registered

(wait a few seconds)
(ok no oops anymore)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837788

Title:
  bcache kernel warning when attaching device

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Invalid

Bug description:
  See attached dmesg, each time this server is rebooted it emits a
  concerning bcache warning.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-54-generic 4.15.0-54.58
  ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18
  Uname: Linux 4.15.0-54-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k4.15.0-54-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', 
'/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', 
'/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Date: Wed Jul 24 12:28:06 2019
  InstallationDate: Installed on 2013-10-04 (2119 days ago)
  InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 
(20130925.1)
  MachineType: Supermicro X9DAi
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic 
root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline 
nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-54-generic N/A
   linux-backports-modules-4.15.0-54-generic  N/A
   linux-firmware 1.173.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago)
  dmi.bios.date: 05/09/2015
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3.2
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: X9DAi
  dmi.board.vendor: Supermicro
  dmi.board.version: 0123456789
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789:
  dmi.product.family: To be filled by O.E.M.
  dmi.product.name: X9DAi
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device

2019-07-29 Thread Mauricio Faria de Oliveira
Attaching test-case script.

** Attachment added: "setup-bcache-wb_percent-before-attach.sh"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+attachment/5279850/+files/setup-bcache-wb_percent-before-attach.sh

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837788

Title:
  bcache kernel warning when attaching device

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Invalid

Bug description:
  See attached dmesg, each time this server is rebooted it emits a
  concerning bcache warning.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-54-generic 4.15.0-54.58
  ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18
  Uname: Linux 4.15.0-54-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k4.15.0-54-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', 
'/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', 
'/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Date: Wed Jul 24 12:28:06 2019
  InstallationDate: Installed on 2013-10-04 (2119 days ago)
  InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 
(20130925.1)
  MachineType: Supermicro X9DAi
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic 
root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline 
nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-54-generic N/A
   linux-backports-modules-4.15.0-54-generic  N/A
   linux-firmware 1.173.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago)
  dmi.bios.date: 05/09/2015
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3.2
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: X9DAi
  dmi.board.vendor: Supermicro
  dmi.board.version: 0123456789
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789:
  dmi.product.family: To be filled by O.E.M.
  dmi.product.name: X9DAi
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837788/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1837788] Re: bcache kernel warning when attaching device

2019-07-29 Thread Mauricio Faria de Oliveira
Patches sent to the kernel-team mailing list:

[B][PATCH] bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
https://lists.ubuntu.com/archives/kernel-team/2019-July/102653.html

[D][PATCH] bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
https://lists.ubuntu.com/archives/kernel-team/2019-July/102654.html

** Description changed:

+ [Impact]
+ 
+  * Users can get a Warning or even Oops the kernel if
+bcache/writeback_percent is set before attaching a
+caching device to the bcache device.
+ 
+  * The fix is trivial, upstream, and consists of just
+checking whether the caching device is attached in
+order to set flags and schedule thread (which oops).
+ 
+ [Test Case]
+ 
+  * See attachment 'setup-bcache-wb_percent-before-attach.sh'
+used in comment #5 and #6 to reproduce the problem(s).
+ 
+  * for 'Warning':
+ 
+# make-bcache -B 
+# make-bcache -C 
+# echo 11 > /sys/block//bcache/writeback_percent
+# sleep 1
+# echo  > /sys/block//bcache/attach
+ 
+  * for 'Oops':
+(steps above, but don't run last command / 'attach').
+ 
+ [Regression Potential]
+ 
+  * Low. The fix is trivial, contained, and exclusive to bcache sysfs
+ handler.
+ 
+  * The modified path has been exercised with synthetic testing (script).
+ 
+ [Original Bug Description]
+ 
  See attached dmesg, each time this server is rebooted it emits a
  concerning bcache warning.
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-54-generic 4.15.0-54.58
  ProcVersionSignature: Ubuntu 4.15.0-54.58-generic 4.15.18
  Uname: Linux 4.15.0-54-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k4.15.0-54-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.7
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/hwC0D2', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D3c', 
'/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC0', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', 
'/dev/snd/controlC1', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer': 
'amixer'
  Date: Wed Jul 24 12:28:06 2019
  InstallationDate: Installed on 2013-10-04 (2119 days ago)
  InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Beta amd64 
(20130925.1)
  MachineType: Supermicro X9DAi
  ProcEnviron:
-  TERM=xterm-256color
-  PATH=(custom, no user)
-  XDG_RUNTIME_DIR=
-  LANG=en_US.UTF-8
-  SHELL=/bin/bash
+  TERM=xterm-256color
+  PATH=(custom, no user)
+  XDG_RUNTIME_DIR=
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-54-generic 
root=UUID=8577302d-1f37-40a6-afcd-385beb26059f ro nomodeset elevator=deadline 
nvme_core.default_ps_max_latency_us=0 nopti noibrs noibpb
  RelatedPackageVersions:
-  linux-restricted-modules-4.15.0-54-generic N/A
-  linux-backports-modules-4.15.0-54-generic  N/A
-  linux-firmware 1.173.9
+  linux-restricted-modules-4.15.0-54-generic N/A
+  linux-backports-modules-4.15.0-54-generic  N/A
+  linux-firmware 1.173.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-06-09 (409 days ago)
  dmi.bios.date: 05/09/2015
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3.2
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: X9DAi
  dmi.board.vendor: Supermicro
  dmi.board.version: 0123456789
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr3.2:bd05/09/2015:svnSupermicro:pnX9DAi:pvr0123456789:rvnSupermicro:rnX9DAi:rvr0123456789:cvnSupermicro:ct3:cvr0123456789:
  dmi.product.family: To be filled by O.E.M.
  dmi.product.name: X9DAi
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Disco)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837788

Title:
  bcache kernel warning when attaching device

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in li

[Kernel-packages] [Bug 1836635] Re: Bionic: support for Solarflare X2542 network adapter (sfc driver)

2019-07-30 Thread Mauricio Faria de Oliveira
Regression testing done on an older/previously supported adapter, SFC
7000 series.

The netperf suite of TCP/UDP STREAM and RR, and TCP_RR ran for ~2 days,
with results in the same ballpark as the original kernel and test kernels.

Now waiting for test results with the new/requested adapter
before marking verification done/successful.

Summary:  test name, mtu sizes, original/test/proposed kernel results.

TCP_CRR
1500/1500 
ORIG 4550-4560
TEST 4550-4580
PROP 5260-5316
9000/9000
ORIG 4557
TEST 4570
PROP 5260-5300

TCP_RR
1500/1500 
ORIG 32531
TEST ~31k,32k
PROP 32180-34277
9000/9000
ORIG 31620
TEST 27k-30k-36k
PROP 27k-33k-34k

TCP_STREAM
1500/1500 
ORIG 9406
TEST 9403
PROP 9405
9000/9000
ORIG 9883
TEST 9887
PROP 9887

UDP_RR
1500/1500 
ORIG ~36k/~37k
TEST ~36k/~37k
PROP ~36k
9000/9000
ORIG ~35k/~37k
TEST 33k-37k
PROP ~35.8k/~36.6k

UDP_STREAM
1500/1500 
ORIG 8.6k/8.9k
TEST 8.9k
PROP 8.6k/8.7k
9000/9000
ORIG 8.7k
TEST 8.7k/8.8k
PROP 8.7k

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1836635

Title:
  Bionic: support for Solarflare X2542 network adapter (sfc driver)

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Invalid
Status in linux source package in Eoan:
  Invalid

Bug description:
  [Impact]

   * Support for Solarflare X2542 network adapter
     (Medford2 / SFC9250) in the Bionic sfc driver.

   * This network adapter is present on recent hardware,
     at least HP 2019 and Dell PowerEdge R740xd systems.

   * On recent-hardware deployments that would rather use
     the Bionic LTS / GA supported kernel and cannot move
     to HWE kernels this adapter is non functional at all.

  [Test Case]

   * The X2542 adapter has been exercised with iperf3 and nc
     across 2 hosts on 25G link speed w/ MTUs 1400/1500/9000
     on both directions, for 1 week.

     Its performance is on par with the Cosmic 4.18 kernel
     (which contains all these patches) and the out-of-tree
     driver from the vendor.

   * The 7000 series adapter (for regression testing an old model,
     supported previously) has been exercised with iperf and netperf
     (TCP_STREAM, UDP_STREAM, TCP_RR, UDP_RR, and TCP_CRR) in one
     host (client/server in different adapter ports isolated with
     network namespaces, so traffic goes through the network switch),
     on 10G link speed on MTUs 1500/9000, for 1 weekend.

     No regressions observed between the original and test kernels.

  [Regression Potential]

   * The patchset touches a lot of the sfc driver, so the potential
     for regression definitely exists. Thus, a lot of consideration
     and testing happened:

   * It has been tested on other adapter which uses the old code,
     and no regressions were found so far (see 7000 series above).

   * The patchset is exclusively cherry-picks, no single backport.

   * The patchset essentially moves the Bionic driver up in the
     upstream 'git log --oneline -- drivers/net/ethernet/sfc/':

     - since commit d4a7a8893d4c ("sfc: pass valid pointers from 
efx_enqueue_unwind")
     - until commit 7f61e6c6279b ("sfc: support FEC configuration through 
ethtool")
     - except for 2 commits (not needed / unrelated)
   - commit 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple 
filters")
   - commit 9baeb5eb1f83 ("sfc: falcon: remove duplicated bit-wise or of 
LOOPBACK_SGMII")
     - plus 2 more recent commits (fixes)
   - commit 458bd99e4974 ("sfc: remove ctpio_dmabuf_start from stats")
   - commit 0c235113b3c4 ("sfc: stop the TX queue before pushing new 
buffers")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836635/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices

2019-07-31 Thread Mauricio Faria de Oliveira
Verification with bionic-proposed of the I/O Error path.

All good, working as expected (see comments #11 to #16).

# uname -rv
4.15.0-56-generic #62-Ubuntu SMP Wed Jul 24 20:18:55 UTC 2019


test 1
--

# ./setup.sh >/dev/null 2>&1
[  369.375820] bcache: register_bdev() registered backing device dm-0
[  369.395195] bcache: run_cache_set() invalidating existing data
[  369.410278] bcache: register_cache() registered cache device dm-1
[  371.393391] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
c1126837-e029-4d08-bad3-38ff8bc08054

# lsblk -e 252
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0  7:001G  0 loop 
└─fake-loop0 253:00 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 
loop1  7:101G  0 loop 
└─fake-loop1 253:10 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 


On another shell:

# fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k
--iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all

# [  425.656209] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
[  425.684837] bcache: error on c1126837-e029-4d08-bad3-38ff8bc08054: 
[  425.684840] journal io error
[  425.686537] , disabling caching
[  425.688849] Buffer I/O error on dev bcache0, logical block 2807, lost async 
page write
[  425.691541] Buffer I/O error on dev bcache0, logical block 2808, lost async 
page write
[  425.694131] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it 
alive.
[  425.698343] Buffer I/O error on dev bcache0, logical block 2810, lost async 
page write
[  425.702522] Buffer I/O error on dev bcache0, logical block 2812, lost async 
page write
[  425.705326] Buffer I/O error on dev bcache0, logical block 2813, lost async 
page write
[  425.707896] Buffer I/O error on dev bcache0, logical block 2814, lost async 
page write
[  425.710692] Buffer I/O error on dev bcache0, logical block 2816, lost async 
page write
[  425.713524] Buffer I/O error on dev bcache0, logical block 2817, lost async 
page write
[  425.716512] Buffer I/O error on dev bcache0, logical block 2818, lost async 
page write
[  425.719156] Buffer I/O error on dev bcache0, logical block 2819, lost async 
page write
[  425.742817] bcache: cached_dev_detach_finish() Caching disabled for dm-0
[  425.746933] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
[  425.750502] bcache: cache_set_free() Cache set 
c1126837-e029-4d08-bad3-38ff8bc08054 unregistered

fio finished:

Run status group 0 (all jobs):
  WRITE: bw=212MiB/s (222MB/s), 212MiB/s-212MiB/s (222MB/s-222MB/s), io=1024MiB 
(1074MB), run=4830-4830msec

bcache not on top of caching device:

# lsblk -e 252
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0  7:001G  0 loop 
└─fake-loop0 253:00 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 
loop1  7:101G  0 loop 
fake-loop1   253:101G  0 dm  


test 2
--

# ./setup.sh >/dev/null 2>&1
[   23.946411] bcache: register_bdev() registered backing device dm-0   

[   23.952262] bcache: run_cache_set() invalidating existing data   

[   23.966564] bcache: register_cache() registered cache device dm-1

[   25.949934] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
d7a3c644-e21e-49bb-bcee-e14709a65745

# lsblk -e 252  

NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT

loop0  7:001G  0 loop   

└─fake-loop0 253:00 1024M  0 dm 

  └─bcache0  251:00 1024M  0 disk   

loop1  7:101G  0 loop   

└─fake-loop1 253:10 1024M  0 dm 
   
  └─bcache0  251:00 1024M  0 disk   


# echo writeback > /sys/block/bcache0/bcache/cache_mode

# dd if=/dev/zero of=/dev/bcache0 bs=4k 
dd: error writing '/dev/bcache0': No space left on device   

262142+0 records in 

262141+0 records out

1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.2152 s, 255 MB/s   


# ./dm_fake_dev.s

[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices

2019-07-31 Thread Mauricio Faria de Oliveira
Verification done for Disco (one patch change only).

Only one of the two bcache devices stop working upon failures in one backing 
device.
(see comment #21 for details).

# uname -rv
5.0.0-22-generic #23-Ubuntu SMP Tue Jul 23 17:23:54 UTC 2019

# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[   25.748828] bcache: register_bdev() registered backing device dm-1
[   25.759145] bcache: register_bdev() registered backing device dm-0
[   25.767247] bcache: run_cache_set() invalidating existing data
[   25.778928] bcache: register_cache() registered cache device dm-2
[   26.768350] bcache: bch_cached_dev_attach() Caching dm-0 as bcache1 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d
[   26.795147] bcache: bch_cached_dev_attach() Caching dm-1 as bcache0 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d

# lsblk -e 252
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0  7:001G  0 loop 
└─fake-loop0 253:00 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop1  7:101G  0 loop 
└─fake-loop1 253:10 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 
loop2  7:201G  0 loop 
└─fake-loop2 253:20 1024M  0 dm   
  ├─bcache0  251:00 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback

# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always

# ./dm_fake_dev.sh /dev/loop0 bad
[   42.723192] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   42.730031] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   42.736198] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[   42.738697] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.742277] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
# [   42.746748] Buffer I/O error on dev bcache1, logical block 262112, async 
page read
[   42.752642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.755650] Buffer I/O error on dev bcache1, logical block 262112, async 
page read
[   42.758209] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.760642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.762860] Buffer I/O error on dev bcache1, logical block 1, async page read


# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k 
&
[1] 1557
[2] 1558
# [   58.982340] bcache: bch_count_backing_io_errors() dm-0: IO error on 
backing device, unrecoverable
[   58.984076] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.985718] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.987382] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.989011] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.990645] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.992293] Buffer I/O error on dev bcache1, logical block 0, lost async 
page write
[   58.993733] Buffer I/O error on dev bcache1, logical block 1, lost async 
page write
[   58.995201] Buffer I/O error on dev bcache1, logical block 2, lost async 
page write
[   58.996651] Buffer I/O error on dev bcache1, logical block 3, lost async 
page write
...
[   59.096950] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   59.098669] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   59.100621] bcache: bch_cached_dev_error() stop bcache1: too many IO errors 
on backing device dm-0
[   59.100621]
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out

[   60.111733] bcache: bcache_device_free() bcache1 stopped

1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.10457 s, 510 MB/s
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.67245 s, 230 MB/s

# lsblk -e 252
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0  7:001G  0 loop 
loop1  7:101G  0 loop 
└─fake-loop1 253:10 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 
loop2  7:201G  0 loop 
└─fake-loop2 253:20 1024M  0 dm   
  └─bcache0  251:00 1024M  0 disk 
fake-loop0   253:001G  0 dm 

only bcache1 was stopped. bcache0 remains working.

# reboot

# ./setup-two-bcache-one-cache.reboot.sh >/dev/null 2>&1
[   17.606164] bcache: register_bdev() registered backing device dm-0
[   17.672177] bcache: register_bdev() registered backing device dm-1
[   17.752456] bcache: bch_journal_replay() journal replay done, 4936 keys in 6 
entries, seq 

[Kernel-packages] [Bug 1829563] Re: bcache: risk of data loss on I/O errors in backing or caching devices

2019-07-31 Thread Mauricio Faria de Oliveira
Verification with bionic-proposed of xfstests results.

No regressions introduced by this bcache patchset.

The comparison between -updates and -proposed is not directly possible
because -proposed introduced failures via other components in I/O path
(e.g., block, ext4).

This is described below, and just to make sure, the -proposed kernel
has been rebuilt with the bcache patchset reverted, and test results
are the same (same failures with/without the patchset; no regression).

It's also been confirmed (below) that tests with a raw block device
(sda) instead of bcache device (thus eliminating the bcache code)
shows the new/introduced failures (ext4/035, generic/553, generic/554).

(the output below does look better on a very wide screen. :-)


proposed kernel: 4.15.0-55.62 (with patchset)
---

xfstests.test.none.log: Failures: ext4/032 ext4/035 generic/371 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554
xfstests.test.writearound.log:  Failures: ext4/032 ext4/035 --- 
generic/451 generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554
xfstests.test.writeback.log:Failures:  ext4/035 --- 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554
xfstests.test.writethrough.log: Failures: ext4/032 ext4/035 --- 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554

sda-only (no bcache)Failures: ext4/032 ext4/035 generic/371
--- generic/484 generic/491 --- --- generic/538
--- generic/553 generic/554


proposed kernel (4.15.0-56 without patchset)
---

$ uname -rv
4.15.0-56-generic #62+test20190730b1 SMP Tue Jul 30 18:25:01 -03 2019


xfstests.test.none.log: Failures: ext4/032 ext4/035 generic/371 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554
xfstests.test.writearound.log:  Failures: ext4/032 ext4/035 --- 
generic/451 generic/484 generic/491 --- generic/537 --- 
generic/547 generic/553 generic/554
xfstests.test.writeback.log:Failures:  ext4/035 --- 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554
xfstests.test.writethrough.log: Failures: ext4/032 ext4/035 --- 
--- generic/484 generic/491 --- generic/537 --- 
--- generic/553 generic/554


test kernel (4.15.0-55 with patchset)
---

xfstests.test.none.log: Failures: ext4/032  --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- ---
xfstests.test.writearound.log:  Failures: ext4/032  --- 
generic/451 generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 
xfstests.test.writeback.log:Failures:   --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 
xfstests.test.writethrough.log: Failures: ext4/032  --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 

orig kernel (4.15.0-55)
---

xfstests.test.none.log: Failures: ext4/032  --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 
xfstests.test.writearound.log:  Failures: ext4/032  generic/371 
generic/451 generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 
xfstests.test.writeback.log:Failures:   --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- --- 
xfstests.test.writethrough.log: Failures: ext4/032  --- 
--- generic/484 generic/491 generic/504 generic/537 --- 
--- --- ---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache de

  1   2   3   4   5   6   7   8   >