Hi Nick, ok - I've now added a more direct test that uses the --is-owner directly; a positive and a negative test, based on real world udev rules. Other typical usage scenarios are mainly enabling and disabling devices, what is done in the more end2end test at the beginning of the test plan. Since this is mainly used at install time, another test could be to start an installation, at the initial subiquity screen navigate the to installer shell and update the s390-tools there to the latest version, leave the installer shell and proceed with the installation, that then runs through the zDev activation screen. I'm adding this to the Tet Plan on top ...
** Description changed: SRU Justification: [ Impact ] * The CCW (or zdev) devices that are special for s390x always need to be explicitly enabled before they can be used. And this is usually done with the help of the 'chzdev -e' command (part of the s390-tools), that also creates underlying udev rules for the device activation. * When for example a qeth network device is persistently configured by 'chzdev -e' the initramfs is usually rebuild, since the corresponding udev rule might be needed in initramfs (early at boot time). * But chzdev also has the parameter 'zdev:early' that allows to explicity direct if the initramfs should be rebuild and the udev rule integrated (zdev:early=1) or not (zdev:early=0). * Right now the initramfs is erroneously rebuild every time and includes all (zdev) udev rules, just ignoring the zdev:early parameter. * This can have a significant impact especially on systems with hundreds (or even thousands) of devices and can lead to space constraints. (Note that also larger ranges of devices can easily be enables with one cmd.) * For example supplemental devices (like disks), that are not relevant at early boot time (and for example may only be used in backup or take-over cases) must not be always activated at boot time. * On system in DPM mode this can also be (to a certain degree) controlled by the HMC (but only in DPM mode). * To fix this situation, two things are needed: - s390-tools: have an option in chzdev that allows to identify if an udev rule is zdev related or not. (since a udev rule could also be generic, and not specific to s390x). - systemd: handle zdev related rules in extra/initramfs-tools/hooks/udev properly according to the zdev:early parameter or use a proper default [ Test Plan ] * Have an s390x LPAR or z/VM system with several ccw/zdev devices. Some might be in use (e.g. for the underlying disk of the main network device), but some spares to test with are needed. Easiest is to use qeth network devices. * List the available devices: $ lszdev | grep qeth | head -3 qeth 0.0.c000:0.0.c001:0.0.c002 yes yes encc000 qeth 0.0.c003:0.0.c004:0.0.c005 no no qeth 0.0.c006:0.0.c007:0.0.c008 no no Notice that using only a short form of the device triples is sufficient. Here c000 is already active, but c003 and c006 are not. * Check which qeth devices are in the current initramfs: lsinitramfs /boot/initrd.img-$(uname -r) | grep usr/lib/udev/rules.d/41-qeth-0.0.c00 usr/lib/udev/rules.d/41-qeth-0.0.c000.rules Like expected, only c000 is listed. * Now add a second device and explicity direct to not include it into the initramfs (using parameter 'zdev:early=0'): $ sudo chzdev -e qeth 0.0.c003 zdev:early=0 and check what's in the initramfs: $ lsinitramfs /boot/initrd.img-$(uname -r) | grep usr/lib/udev/rules.d/41-qeth-0.0.c00 usr/lib/udev/rules.d/41-qeth-0.0.c000.rules Still c000 only. * Now add another device, but this time explicitly directing to incl. the corresponding udev rule into initramfs: $ sudo chzdev -e qeth 0.0.c006 zdev:early=1 and check again the content of the initramfs: $ lsinitramfs /boot/initrd.img-(uname -r) | grep usr/lib/udev/rules.d/41-qeth-0.0.c00 usr/lib/udev/rules.d/41-qeth-0.0.c000.rules usr/lib/udev/rules.d/41-qeth-0.0.c006.rules Now both are included. * More for regression testing disable/remove the devices again ('zdev:early' parameter is irrelevant in this case): sudo chzdev -d qeth c003 sude chzdev -d qeth c006 check: $ lsinitramfs /boot/initrd.img-(uname -r) | grep -i usr/lib/udev/rules.d/41-qeth-0.0.c00 usr/lib/udev/rules.d/41-qeth-0.0.c000.rules * Add a device without parameter 'zdev:early' specified at all, which needs to default to 'zdev:early=1': sudo chzdev -e qeth c003 and check: $ lsinitramfs /boot/initrd.img-(uname -r) | grep -i usr/lib/udev/rules.d/41-qeth-0.0.c00 usr/lib/udev/rules.d/41-qeth-0.0.c000.rules usr/lib/udev/rules.d/41-qeth-0.0.c003.rules - * The primary use case for the new chzdev option '--is-owner' - was for the scripted udev rule handling, however, - the option can also be more directly tested, but the result - needs to be checked based on the the given return code: - - - this is a standard udev rule of a ccw qeth network device - (zdev) and is with that always created by chzdev: - $ ls /etc/udev/rules.d/41-qeth-0.0.0600.rules - /etc/udev/rules.d/41-qeth-0.0.0600.rules - - hence 'chzdev --is-owner' succeeds and returns exit code '0' - (EXIT_OK in exit_code.h): - $ chzdev --is-owner /etc/udev/rules.d/41-qeth-0.0.0600.rules - $ echo $? - 0 - - however, the udev rule for snapd was obviously not added by chzdev:, - $ ls /etc/udev/rules.d/70-snap.snapd.rules - /etc/udev/rules.d/70-snap.snapd.rules - - hence the return code here is the expected '33' - (the newly introduced exit code 'EXIT_UNKNOWN_FILE' in exit_code.h) - $ chzdev --is-owner /etc/udev/rules.d/70-snap.snapd.rules - $ echo $? - 33 + * The primary use case for the new chzdev option '--is-owner' + was for the scripted udev rule handling, however, + the option can also be more directly tested, but the result + needs to be checked based on the the given return code: + + - this is a standard udev rule of a ccw qeth network device + (zdev) and is with that always created by chzdev: + $ ls /etc/udev/rules.d/41-qeth-0.0.0600.rules + /etc/udev/rules.d/41-qeth-0.0.0600.rules + - hence 'chzdev --is-owner' succeeds and returns exit code '0' + (EXIT_OK in exit_code.h): + $ chzdev --is-owner /etc/udev/rules.d/41-qeth-0.0.0600.rules + $ echo $? + 0 + - however, the udev rule for snapd was obviously not added by chzdev:, + $ ls /etc/udev/rules.d/70-snap.snapd.rules + /etc/udev/rules.d/70-snap.snapd.rules + - hence the return code here is the expected '33' + (the newly introduced exit code 'EXIT_UNKNOWN_FILE' in exit_code.h) + $ chzdev --is-owner /etc/udev/rules.d/70-snap.snapd.rules + $ echo $? + 33 + + * chzdev is especially used at install time, hence another test + would be to start an installation, and at the initial subiquity + screen immediately navigate the to installer shell and update + the s390-tools to the updated version, leave the installer shell + and proceed with the installation. + The installation will then run through the usual zDev activation + screen (using the updated s390-tools), which makes use of chzdev. [ Where problems could occur ] * The modification in the s390-tools are to expand the chzdev command with the option '--is-owner <rule>' that allows to identify a zdev rule. * Since it's added (no existing code line was removed or modified) the impact is moderate, because it is obviously not in use yet by anyone using noble. * However, the code for this option got inserted into existing, hence in case the new lines are not properly closed/terminated problem can occur that can even have an impact on other chzdev arguments and paramaters (e.g. the ones that are in the case stmt before and after 'is-owner'). * The exit code EXIT_UNKNOWN_FILE of 'is-owner' is 33, whereas the defined number could be wrong, used accidentially multiple times or a different exit code is expected, which may lead to wrong states. * The upstream commit needed to be modifed in one aspect (to backport it to 2.31). Between version 2.33, where 'is-owner' was added and the version in noble (2.31) a refactoring happend (commit 4c2bfb1d47e7), that led (amongst other, not relevant changes) to a the renaming of the file zdev/include/site.h to zdev/include/zdev.h. Fortunately the content of the file stayed the same, so that no add. commit needed to be applied, but only the file name in the quilt patch adjusted. * Some modification are for the man page and usage.txt file only. * For systemd / extra/initramfs-tools/hooks/udev the modification of this LP#2044104 are not sofficient, since after all this was introduced into oracular two more cases occured that needed to be handled on top, that are: - ensure rules file exists before invoking chzdev (LP: #2079993) - udev rules are copied in case zdev_early is not specified (LP: #2102236) * To simplyfy the systemd modifications (and with that reduce risk) the version check of the initial modfification that checks for the s390-tools version (2.33, to ensure that chzdev '--is-owner' is only used if the right s390-tools package is available) got removed, since this is now backported to previous version 2.31 (hence would fail). And because noble will never get a new version anymore, the check is obsolete. * All this affects the s390x architecture only. * (We may think of removing it from the current development release as well that comes today with v2.38.0, since we will never go back to an older s390-tools version.) [ Other Info ] * The systemd / udev changes will be piggy-baged on a bigger systemd update (to avoid too many updates, since this affects s390x-only but would trigger updates for other architectures too.) * The s390-tools and systemd modifications can be done separately, in case s390-tools has landed in the archive before the systemd modifications, since systemd will be the first exploiter of the s390-tools modification. Hence a grouped upload is not needed, if s390-tools is handled first. * A test build in PPA is available here: https://launchpad.net/~fheimes/+archive/ubuntu/lp2103414+lp2078347+lp2044104 and the test packages were tested: https://pastebin.canonical.com/p/nfGDnHVYWd/ __________ Versions: Ubuntu 20.04.5 s390-tools version 2.12.0-0ubuntu3.7.s390x Ubuntu 22.04.2 s390-tools version 2.20.0-0ubuntu3.2.s390x When I configure a zfcp LUN persistently via chzdev, the initrd is being rebuilt even with parameter zdev:early=0 root@a8315003:~# chzdev -e zfcp-lun 0.0.1803:0x500507630910d430:0x4019409200000000 zdev:early=0 zFCP LUN 0.0.1803:0x500507630910d430:0x4019409200000000 configured Note: The initial RAM-disk must be updated for these changes to take effect: - zFCP LUN 0.0.1803:0x500507630910d430:0x4019409200000000 update-initramfs: Generating /boot/initrd.img-5.15.0-60-generic I: The initramfs will attempt to resume from /dev/dasdb1 I: (UUID=e70ecb80-4d1e-4074-9cda-ce231ad6e698) I: Set the RESUME variable to override this. Using config file '/etc/zipl.conf' Building bootmap in '/boot' Adding IPL section 'ubuntu' (default) Preparing boot device: dasda (c00a). Done. root@a8315003:~# == Comment: - Thorsten Diehl <thorsten.di...@de.ibm.com> - 2023-03-01 06:55:47 == @BOE-dev This behaviour is unexpected. https://www.ibm.com/docs/en/linux-on-systems?topic=commands-chzdev says: Activating a device early during the boot process Use the zdev:early device attribute to activate a device early during the boot process and to override any existing auto-configuration with a persistent device configuration. zdev:early=1 The device is activated during the initial RAM disc phase according to the persistent configuration. zdev:early=0 The device is activated as usual during the boot process. This is the default. If auto-configuration data is present, the device is activated during the initial RAM disc phase according to the auto-configuration. I can't interprete a SCSI LUN as a device with auto configuration data. (At least, if the zfcp device hasn't NPIV enabled) == Comment: #5 - Peter Oberparleiter <peter.oberparlei...@de.ibm.com> - 2023-03-01 11:18:28 == (In reply to comment #2) > @BOE-dev > This behaviour is unexpected. > https://www.ibm.com/docs/en/linux-on-systems?topic=commands-chzdev says: > Activating a device early during the boot process > > Use the zdev:early device attribute to activate a device early during the > boot process and to override any existing auto-configuration with a > persistent device configuration. > > zdev:early=1 > The device is activated during the initial RAM disc phase according to > the persistent configuration. > > zdev:early=0 > The device is activated as usual during the boot process. This is the > default. If auto-configuration data is present, the device is activated > during the initial RAM disc phase according to the auto-configuration. The documentation is incorrect for Ubuntu. Canonical specifically builds zdev in a way that every change to persistent device configuration causes an update to the initial RAM-disk. See also: https://bugzilla.linux.ibm.com/show_bug.cgi?id=187578#c35 https://github.com/ibm-s390-linux/s390-tools/commit/7dd03eaeecdd0e2674f145aca34be1275d291bd8 > I can't interprete a SCSI LUN as a device with auto configuration data. (At > least, if the zfcp device hasn't NPIV enabled) This is related to auto-configuration as implemented for DPM. == Comment: #6 - Thorsten Diehl <thorsten.di...@de.ibm.com> - 2023-03-03 12:41:44 == So, IIUC, chzdev is built for Ubuntu with ZDEV_ALWAYS_UPDATE_INITRD=1, which make the parameter zdev:early=0 ineffective. Correct? If you confirm, you may also close this bug. Not nice - then we have to find an alternate solution. == Comment: #7 - Peter Oberparleiter <peter.oberparlei...@de.ibm.com> - 2023-03-07 06:48:07 == (In reply to comment #6) > So, IIUC, chzdev is built for Ubuntu with ZDEV_ALWAYS_UPDATE_INITRD=1, which > make the parameter zdev:early=0 ineffective. Correct? > If you confirm, you may also close this bug. > > Not nice - then we have to find an alternate solution. chzdev -p on Ubuntu will by default rebuild the initrd. This is intended behavior by Canonical and controlled by the ZDEV_ALWAYS_UPDATE_INITRD build-time switch. You can suppress it by adding option --no-root-update to the command line. Specifying zdev:early=0 to chzdev has exactly the effect that it is supposed to have: it tells zdev not to enable that device during initrd processing, resulting in the corresponding udev rule not being copied to the initrd [1]. Unfortunately there is another Ubuntu-initrd script [2] that simply copies ALL udev rules, including those created by zdev, into the initrd. As a result, zdev's early-attribute handling is rendered useless and all devices are enabled, even if a user specified zdev:early=0. Since this bug report indicates that there is a use-case for this function in Ubuntu, it might be worth asking Canonical if current processing could be changed to provide a way for users to specify that a device should specifically NOT be enabled within initrd processing. Technically this could easily be done: 1) Have the generic udev initramfs script not copy zdev-generated Udev rules, OR have the zdev initramfs script remove those rules (somewhat of a hack) 2) Change the zdev initramfs script logic from the current: - enable devices required for the root file system, AND - enable devices for which zdev:early=1 was specified to - enable all persistently configured devices EXCEPT those for which zdev:early=0 was specified This change would be needed to maintain Canonical's policy of enabling all devices in the initrd by default I'm open to adding the change in 2) to our s390-tools package, but someone at Canonical would need to work out a way to implement 1). [1] https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/initramfs/hooks/s390-tools-zdev#L47 [2] https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/tree/debian/extra/initramfs-tools/hooks/udev#n42 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2044104 Title: [UBUNTU 20.04] chzdev -e is rebuilding initramfs even with zdev:early=0 set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2044104/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs