OK - this is a messy one. It is due to the backport of this:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

Reverting that is probably not the right answer because the point of it
is to avoid corruption. But this is a pretty serious usability issue. It
is not at all clear from the message that a user needs to do *something*
- and what that *something* is is even less clear:

Here's the message, buried in a ton of other messages:
[   72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
[   72.728149] md/raid0: please set raid.default_layout to 1 or 2
[   72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524

So if you understand from that that you need to pass a kernel parameter,
you're more intuitive than I am. And if you understand from that *why*,
and *to which one* - well, you probably wrote the patch. And even then,
you probably didn't realize the parameter is actually incorrect (HINT:
we should backport this as well:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571).

IMO, the error message should include a URL to page with clear steps on
how to proceed which I think is something along the lines of "Use mdadm
to figure out when your array was created, figure out what kernel you
were running back then (ideally with a mapping to Ubuntu release), and
then how to fix it.

That said, it isn't clear to me why we saw this issue on this specific
machine. This issue is supposedly restricted to only multi-zone RAID0
configs, which should only happen if not all members are the same size.
But I happen to know that all members on this system here *are* the same
size! I've tried to reproduce it but, after redeploying the system with
MAAS, it upgrades and reboots w/o error :(

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  [Impact]
  After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia 
DGX2 system is no longer bootable.

  [Test Case]
  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to