There is a scenario where a real rootfs is located on a bcache device,
however, for that we need to register a bcache device at the initrd
stage which already happens now. Then we'd locate a file system on it
and do pivot_root and so on.

The bcache<i> naming, I believe, is not guaranteed at this point unless
we have a rule that says so.

Side-tracking to our field use-cases, we need persistence in
/dev/bcache<i> names based on superblock UUIDs. So, I expect
/dev/bcache/<i> names to be persisted by UUID on first discovery (which
corresponds to MAAS deploy stage, not commissioning as in case of disk
serial numbers).

However, we also expect bcache<i> names to match names in MAAS which may
not happen in this scenario because <backing-dev-name> : bcache<i>
mapping is not enforced.

Going back to https://bugs.launchpad.net/curtin/+bug/1728742, I think we
can break it down into two problems:

1. bcache device numbers are not static across reboots and we need a
static mapping of superblock UUID to bcache<i> for a given device. This
requires CACHED_UUID to be present in uevent environment which is only
possible during a successful registration where this code path is
triggered. As a result of rootfs on bcache requirement, this makes sense
to do at the initrd stage before we have to do pivot_root to the real
rootfs.

Doing something like that when systemd is running post pivot_root and
/dev devtmpfs transfer to the real rootfs doesn't sound right to me as
we have this problem with double registration. In summary, I think
/dev/bcache/by-uuid/ symlinks for bcache devices that exist on initial
boot should be created via udev rules in initrd.

This is what this bug is about.

2. bcache device names may not match the ones in MAAS. This has
implications for our use of Juju Storage functionality when we need
device special files with static names without file systems or partition
tables present. After commissioning in MAAS there's already metadata
present about a given machine - disk serial numbers are gathered (if
present, this is not guaranteed and block driver-specific AFAIK but a
sane assumption to make) and device names that were assigned during
ephemeral image boot are presented and stored in a database with
associated serial numbers available for querying to set up dname
symlinks on deployment.

In order to make <backing-dev-name> : bcache<i> mapping static we need
to essentially have a mapping of disk serial numbers to bcache
superblock UUIDs which are in turn mapped to bcache<i> names.

I would say that https://bugs.launchpad.net/curtin/+bug/1728742 is about
p.2.

====

The rationale for p. 1 is that the init script sets up devtmpfs
initially which then gets moved over to the real rootfs (init-bottom
script) before pivot_root is performed. systemd then runs its mount
point set up code which checks if a given entry in its hard-coded table
of mount points is already a mount point and skips its setup if this is
the case. So anything set up during initrd stage will stay there after
systemd runs as devtmpfs is moved and reused.

https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/src/core/mount-setup.c?h=applied/ubuntu/xenial-updates#n77
  { "devtmpfs", "/dev", "devtmpfs", "mode=755", MS_NOSUID|MS_STRICTATIME,

path_is_mount_point -> fd_is_mount_point
https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/src/core/mount-setup.c?h=applied/ubuntu/xenial-updates#n161

static int mount_one(const MountPoint *p, bool relabel) {
...
        r = path_is_mount_point(p->where, AT_SYMLINK_FOLLOW);
        if (r < 0 && r != -ENOENT) {
                log_full_errno((p->mode & MNT_FATAL) ? LOG_ERR : LOG_DEBUG, r, 
"Failed to determine whether %s is a mount point: %m", p->where);
                return (p->mode & MNT_FATAL) ? r : 0;
        }
        if (r > 0)
                return 0;


init script:
https://git.launchpad.net/~usd-import-team/ubuntu/+source/initramfs-tools/tree/init?h=applied/ubuntu/xenial-updates
[ -d /dev ] || mkdir -m 0755 /dev
...

# Note that this only becomes /dev on the real filesystem if udev's scripts
# are used; which they will be, but it's worth pointing out
if ! mount -t devtmpfs -o nosuid,mode=0755 udev /dev; then
     echo "W: devtmpfs not available, falling back to tmpfs for /dev"
     mount -t tmpfs -o nosuid,mode=0755 udev /dev
     [ -e /dev/console ] || mknod -m 0600 /dev/console c 5 1
     [ -e /dev/null ] || mknod /dev/null c 1 3
fi
...


init-bottom:
https://git.launchpad.net/~usd-import-team/ubuntu/+source/systemd/tree/debian/extra/initramfs-tools/scripts/init-bottom/udev?h=applied/ubuntu/xenial-updates

...
# move the /dev tmpfs to the rootfs
mount -n -o move /dev ${rootmnt}/dev

# create a temporary symlink to the final /dev for other initramfs scripts
if command -v nuke >/dev/null; then
  nuke /dev
else
  rm -rf /dev
fi
ln -s ${rootmnt}/dev /dev

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1729145

Title:
  /dev/bcache/by-uuid links not created after reboot

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged
Status in linux source package in Zesty:
  Triaged
Status in linux source package in Artful:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  1. $ lsb_release -rd
  Description:  Ubuntu 17.10
  Release:      17.10

  2. $ apt-cache policy linux-image-`uname -r`
  linux-image-4.13.0-16-generic:
    Installed: 4.13.0-16.19
    Candidate: 4.13.0-16.19
    Version table:
   *** 4.13.0-16.19 500
          500 http://nova.clouds.archive.ubuntu.com/ubuntu artful/main amd64 
Packages
          100 /var/lib/dpkg/status

  3. After creating some bcache devices and rebooting 
/dev/bcache/by-uuid/<UUID> -> ../../bcacheN
  symlinks point to the current bcache device which is caching the dev.uuid 
found after creating a backing device.

  4. /dev/bcache/by-uuid does not exist and there are not symlinks
  underneath

  
  It appears that since the initramfs loads the bcache module which probes and 
finds all of the cache devices and backing devices then once the rootfs is 
mounted and udev gets to run, the bcache kernel module does not emit the 
CACHED_UUID value into the environment if the underlying devices are already 
registered.

  In dmesg, one can see that prior to mounting the rootfs, we see bcache
  register events:

  [    5.333973] bcache: register_bdev() registered backing device vdb2
  [    5.354138] bcache: register_bdev() registered backing device vdb4
  [    5.365665] bcache: register_bdev() registered backing device vdb3
  [    5.397720] bcache: bch_journal_replay() journal replay done, 0 keys in 1 
entries, seq 1
  [    5.428683] bcache: register_cache() registered cache device vdb1

  then rootfs ismounted and systemd starts systemd-udev

  [    9.350889] systemd[1]: Listening on udev Kernel Socket.

  And then the coldplug replay of kernel events triggers 
/lib/udev/rules.d/69-bcache.rules
  which invokes /lib/udev/bcache-register which writes the device name 
(/dev/vdb1 or /dev/bcache0) into /sys/fs/bcache/register and results is the 
bcache kernel driver attempting to register the block device.  However, there 
is already a bcache device associated already and registration fails

  [   11.173141] bcache: register_bcache() error opening /dev/vdb2: device 
already registered
  [   11.184617] bcache: register_bcache() error opening /dev/vdb3: device 
already registered
  [   11.199130] bcache: register_bcache() error opening /dev/vdb1: device 
already registered
  [   11.271694] bcache: register_bcache() error opening /dev/vdb4: device 
already registered

  The problem then is that only a kernel call to bch_cached_dev_run()
  which happens like this:

  bcache_register()
    register_bdev()
      bch_cached_dev_run()
        kobject_uevent_env(&disk_to_dev(d->disk)->kobj, KOBJ_CHANGE, env);
        
  where env includes: 
      "DRIVER=bcache",
          kasprintf(GFP_KERNEL, "CACHED_UUID=%pU", dc->sb.uuid),
          NULL,
          NULL,
      };

  Since that event is not emitted for any previously registered device,
  then the symlink will not be created.

  ProblemType: Bug
  DistroRelease: Ubuntu 17.10
  Package: linux-image-4.13.0-16-generic 4.13.0-16.19
  ProcVersionSignature: User Name 4.13.0-16.19-generic 4.13.4
  Uname: Linux 4.13.0-16-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Oct 31 22:09 seq
   crw-rw---- 1 root audio 116, 33 Oct 31 22:09 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.7-0ubuntu3.1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  Date: Wed Nov  1 01:39:01 2017
  Ec2AMI: ami-0000030b
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: nova
  Ec2InstanceType: m1.small
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd 
   Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: OpenStack Foundation OpenStack Nova
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-16-generic 
root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
  RelatedPackageVersions:
   linux-restricted-modules-4.13.0-16-generic N/A
   linux-backports-modules-4.13.0-16-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/01/2014
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: 1.10.1-1ubuntu1~cloud0
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: pc-i440fx-zesty
  dmi.modalias: 
dmi:bvnSeaBIOS:bvr1.10.1-1ubuntu1~cloud0:bd04/01/2014:svnOpenStackFoundation:pnOpenStackNova:pvr15.0.7:cvnQEMU:ct1:cvrpc-i440fx-zesty:
  dmi.product.family: Virtual Machine
  dmi.product.name: OpenStack Nova
  dmi.product.version: 15.0.7
  dmi.sys.vendor: OpenStack Foundation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729145/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to