Peter Nagel <peter.na...@kit.edu> writes: > Am 11.07.2015 18:40, schrieb Philip Hands: >> >> ... which is what suggests to me that it's been broken by other >> means -- the fact that one can apparently start it by hand tells you >> that it's basically working, so I'd think the described symptoms point >> strongly towards duff mdadm.conf in the initramfs. >> >> N.B. I've not very had much to do with systemd, so am in no sense an >> expert about that, but I've been using software raid and initrd's since >> almost as soon as they were available, and the idea that this would be >> down to systemd does not ring true. > > Thanks for pointing out this. > Hopefully, someone is able to solve this problem.
Well, yes -- _you_ can hopefully. 0) (just in case you've not already done so, check all the bits suggested in the warning that you quoted initially, about the contents of /proc/... etc.) 1) on the system when booted up, check the current state of your /etc/mdadm/mdadm.conf Compare it with the output of: mdadm --examine --scan If there are significant differences (other than the missing disk), then fix them. 2) have a look at your initrd, thus: mkdir /tmp/initrd ; cd /tmp/initrd ; zcat /boot/initrd.img-* | cpio -iv --no-absolute-filenames (of course, being an ARM thing, you probably have some sort of uInitrd thing as well, so I guess it's possible to break things between the initrd.img and that, but someone who knows about such things would need to tell you about that). Anyway, you should have something like this: /tmp/initrd$ find . -name mdadm\* ./scripts/local-top/mdadm ./etc/mdadm ./etc/mdadm/mdadm.conf ./etc/modprobe.d/mdadm.conf ./conf/mdadm ./sbin/mdadm so, take a look at that lot to see if you can spot what's up. As an example, this is what I see on a little amd64 RAID box with Jessie, which I have to hand: root@linhost-th:/tmp/initrd# cat conf/mdadm MD_HOMEHOST='linhost-th' MD_DEVS=all root@linhost-th:/tmp/initrd# cat etc/mdadm/mdadm.conf HOMEHOST <system> ARRAY /dev/md/2 metadata=1.2 UUID=00e84ce1:d96de981:375caa64:dac234f9 name=grml:2 ARRAY /dev/md/3 metadata=1.2 UUID=c9871cb8:46a3dd98:d9505965:5bd7dfe2 name=grml:3 (I tend to number my md's to match the partitions they sit on, hence the 2 & 3) 3) save a copy of your old initrd.img somewhere, then run: update-initramfs -u and try a reboot -- if it works, unpack both initrd's in adjacent directories, and use diff -ur to spot what changed, and report back here. 4) If it didn't work, once in the emergency shell, try running: sh -x /scripts/local-top/mdadm and see if you can see why it's not working when starting things by hand does. 5) If that fails to be diagnostic, is there anything hiding in your uboot configuration that might be causing this? (assuming this box has u-boot) HTH Cheers, Phil. P.S. While you have the initrd unpacked, you might want to note that: root@linhost-th:/tmp/initrd# grep -r systemd . ./init:# Mount /usr only if init is systemd (after reading symlink) ./init:if [ "${checktarget##*/}" = systemd ] && read_fstab_entry /usr; then ./scripts/init-top/udev:/lib/systemd/systemd-udevd --daemon --resolve-names=never ./etc/lvm/lvm.conf: # systemd's socket-based service activation or run as an initscripts service ./lib/udev/rules.d/63-md-raid-arrays.rules:# Tell systemd to run mdmon for our container, if we need it. Binary file ./lib/systemd/systemd-udevd matches Binary file ./lib/x86_64-linux-gnu/libselinux.so.1 matches Binary file ./bin/kmod matches Binary file ./bin/udevadm matches while the scripts on the initrd image are systemd-aware, it's init is actually a shell script -- so you're running busybox as your init at this point. Also: root@linhost-th:/tmp/initrd# grep -r 'Gave up waiting for' . ./scripts/local: echo "Gave up waiting for $2 device. Common problems:" this is the script that's dropping you into the emergency shell. The thing that starts the shell is the panic() function from scripts/functions -- I can see that that will do a timed reboot if you've got panic=... on the kernel command line, but otherwise not. Would you have something like that on your command line? (as mentioned in the warning you quoted, /proc/cmdline tells you) If not, do you perhaps have a hardware watchdog, or some such? -- |)| Philip Hands [+44 (0)20 8530 9560] HANDS.COM Ltd. |-| http://www.hands.com/ http://ftp.uk.debian.org/ |(| Hugo-Klemm-Strasse 34, 21075 Hamburg, GERMANY
signature.asc
Description: PGP signature