Control: retitle 588675 SCSI subsystem loses name of root device on boot Control: severity 588675 normal Control: found 588675 3.2.78-1 Control: found 588675 3.16.7-ckt20-1+deb8u3 Control: found 588675 3.16.7-ckt25-2 Control: found 588675 2.6.18
According to the advanced information on the BTS, under severity levels: wishlist for any feature request, and also for any bugs that are very difficult to fix due to major design considerations. The first condition is untrue, this is definitely a bug. While the damage may not be that major, it is pretty widespread. If the Debian kernel maintainers were to claim this wasn't a problem, then I would be forced to report another bug against src:linux since the kernel build scripts themselves are confused by this behavior! The second condition requires a judgement call to evaluate, but looking at things I'm pretty sure it is untrue. I'm guessing this is simply one crucial field that needs to be copied by the SCSI subsystem, but is not. Since many other subsystems manage to copy the value, almost certainly the change is small. I'd be surprised if it took more than 4 lines to fix (two of which being blank and one being a comment). I will concede this may need expertise on how /proc/mounts works and the interface between that and the driver subsystems (alternatively simply looking for one field which is ignored may be enough), but with that this should be a simple fix. Meanwhile the damage from this bug may not be that large, but it is rather widespread. I know of 4 reports where this is the root cause and I imagine there are others I do not know of. There may also be many utilities that already work around this bug and hundreds of scripts that are similarly forced to do so. This bug has also wasted a great deal of time trying to figure out where to attribute the issue. My earliest observations were close to a decade ago, but I didn't feel confident placing blame anywhere. Then more recently I had to spend time building several kernels to confirm the conditions under which the problem occurred. Uneffected systems: This group consists of all system where the root filesystem is NOT on a device that directly plugs into the SCSI subsystem. It does not matter whether an initial ramdisk is used or not. This includes systems like: root on Linux software RAID: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/md0 / ext3 ro 0 0 $ I recall this system being in service from around 2.6.5(?) to 2.6.18 or so. Even though the immediate driver was the MD subsystem, underlying this were SCSI devices. This is long in the past, but I'd already been observing the bug by then (and wondering where to point the finger). root on olde IDE devices, on the olde IDE subsystem: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/hda1 / ext3 ro 0 0 $ I think this system managed to remain in service into the 2.6.29 timeframe, but is also no longer in service. This does give an example of the root filesystem being on a different subsystem though. Crucially this is prior to the olde IDE subsystem being retired and the driver for PATA devices which plugged into the SCSI subsystem coming into service. root on MTD devices: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/mtdblock4 / jffs2 rw,relatime 0 0 $ A very different system here. Different filesystem and rather different device. This one hasn't been tried with kernels earlier than 3.2, but seems to echo other observations. This one is in active service and due to interesting setup allows for testing of some interesting scenarios. root on BLK_DEV_IDE_PMAC (olde Mac IDE subsystem?): This is Christian Kujau's report in bug #588675. I believe BLK_DEV_IDE_PMAC would be a PowerMac analog of the x86 IDE driver which had it's own subsystem and which didn't plug into the SCSI subsystem. Effected systems: This group consists of all system where the root filesystem is on a device that directly plugs into the SCSI subsystem and the system directly mounts that device at boot. On such systems: $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/root / <somefs> ro,relatime 0 0 $ Most of my systems are running ext3, but Christian Kujau confirmed this with ext4 and jfs. Christian Kujau also observed this with the PATA_MACIO driver, which I believe is a Macintosh equivalent of the x86 PATA driver which plugs into the SCSI subsystem. I've observed this on many different systems with devices which plug into the SCSI subsystem, this includes a 3ware card, SATA disks, USB flash drives and genuine SCSI disks. Workaround: The workaround that bypasses the problem is to initially mount some other device as root, then pivot_root or such onto the real root. Using an initial ramdisk is one example of this. From the DebWRT project I'm also aware of the case of booting onto a root on MTD and then doing a pivot_root onto a USB flash key works arount the issue. $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts /dev/sda1 / ext3 ro,noatime,nodiratime,acl,barrier=1 0 0 $ Problem is this is only working around the underlying cause. On a system with limited memory and little non-SCSI storage (think embedded systems) it could be impossible to avoid directly mounting the real SCSI root filesystem on boot. Anyone who needs to build a custom kernel for various reasons will likely know the root device and want to build a kernel which directly mounts it. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sig...@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445