On 4/01/19 2:48 PM, Richard Hector wrote: > Hi all, > > This is one of those annoying cases where I claim "It was working, and I > didn't do anything, and now it doesn't" - suspicious, I know ... > > In this case, I can see from my emails that this machine booted (via > wake-on-lan from a cronjob) this morning, and then shut itself down (via > a local cronjob), having done its job. Then later, I booted it manually, > and it didn't - when I plugged in a screen and keyboard, I found it at > the 'root password for maintenance or Ctrl-D to continue' prompt. > > On logging in, I found it had had problems mounting filesystems. > > All further attempts to boot it have gone straight from grub to a > blinking underline cursor in the top left. > > If I boot with one of the 'recovery mode' options, I can get back to the > maintenance option. Having dug around a bit, I find that 'vgchange -ay' > followed by 'systemctl default' brings it up, in an apparently normal state. > > It reports warnings about being unable to connect to lvmetad, but I > gather that's not normally something to worry about. On the other hand, > the searches I've done have only found suggestions for disabling it, not > making it work. > > General info about the system: > > It's a generic tower system, with (currently) 6 disks. > > I use RAID 1 for everything - but in an effort to keep the arrays small > (an attempt to reduce the risk of failing to rebuild before another > error happens), there are many partitions, grouped into vgs with lvm. > However, the filesystems I'm having problems with are on a vg that has > only one md in it, because those disks are relatively small anyway. > > It runs a single kvm guest, which does my backups using dirvish (pulling > from various machines both locally and over the net) - hence the > automated boot in the middle of the night. > > The weather is kind of hot (New Zealand summer), which is one of the > reasons I went to the cronjob solution to not (normally) run it during > the day - I started that last summer, but have had it running 24/7 for a > while, and then shut it down to rely on this system a couple of days > ago, so it hasn't had many boot cycles recently. > > Any tips/questions very welcome.
It turns out the later failures to boot probably weren't; it's just that I had 'quiet' enabled in the kernel commandline. Disabling that enabled me to see where it was hanging - which is now queried in my 'Slow boot?' thread. So the initial failure must have been a one-off thing, which if anything is more worrying - sounding more like hardware. Richard Richard
signature.asc
Description: OpenPGP digital signature