Interesting. Going to try! The new kernel was installed during an upgrade from Debian 7 Wheezy to Debian 8 Jessie. The upgrade went ok on the 8 nodes of the cluster, but not on the master. Btw, on the nodes kernel 3.16 is working ok.
Stupid question: It's worth trying to make the new kernel work, in your opinion? If, in the worst case, I have to keep the 3.2 kernel on the master is so bad? Elisabetta 2018-01-09 13:27 GMT+01:00 John Hearns <hear...@googlemail.com>: > Elisabetta, I am not an expert on Debian systems. > I think to solve your problem with the kernels, you need to recreate the > initial ramdisk and make sure it has the modules you need. > > So boot the system in kernel 3.2 and then run: > mkinitrd 3.16.0-4-amd64 > > > How was the kernel version 3.16.0-4-amd64 installed? > > > On 9 January 2018 at 13:16, Elisabetta Falivene <e.faliv...@ilabroma.com> > wrote: > >> Root file system is on the master. I'm being able to boot the machine >> changing kernel. Grub allow to boot from two kernel: >> >> >> kernel 3.2.0-4-amd64 >> >> kernel 3.16.0-4-amd64 >> >> >> The problem is with kernel 3.16, but boots correctly with 3.2. >> >> >> Anyway, rebooting with kernel 3.2, slurm (now updated to 14.03.9, was >> 2.3.4) doesn't work anymore and gives this error: >> >> >> First time after reboot launching sinfo: >> >> *sinfo: error: If munged is up, restart with —numthreads=10* >> >> *sinfo: error: Munge encode failed: Failed to access >> /var/run/munge/munge.socket2”: No such file or directory* >> >> *sino: error: Authentication: Socket communication error* >> >> *slurm_load_partition: Protocol authentication error* >> >> >> Re-launching sinfo >> >> *slurm_load_jobs error: Unable to contact slurm controller (connect >> failure)* >> >> >> What does it mean? >> >> >> betta >> >> >> PS: In the kernel 3.16 case, it gives the "gave up waiting" error and >> *before* the error is thrown there is another error >> >> "Running scripts/local-block >> >> Unable to find lvm volume" >> >> It keeps trying this thing several times and then falls back to >> initramfs. (even if booted in recovery!) >> >> Moreover, in this situation it seems not to load the usb keyboard so i'm >> truly able to do anything. >> >> >> >> 2018-01-08 12:26 GMT+01:00 Markus Köberl <markus.koeb...@tugraz.at>: >> >>> On Monday, 8 January 2018 11:39:32 CET Elisabetta Falivene wrote: >>> > Here I am again. >>> > In the end, I did the upgrade from debian 7 wheezy to debian 8 jessie >>> in >>> > order to update Slurm and solve some issues with it. It seemed it all >>> went >>> > well. Even slurm problem seemed solved. Then I rebooted the machine >>> and the >>> > problems began. I can't boot the master anymore returning an error: >>> > >>> > *gave up waiting for root device. Common problems:- Boot args (cat >>> > /proc/cmdline)- check rootdelay= (did the sistem wait long enouth?)- >>> check >>> > root= (did the sistem wait for the right device?)- missing modules (cat >>> > /proc/modules; ls /dev)ALERT! /dev/mapper/system-root does not exist. >>> > Dropping to a shell!"* >>> > *modprobe: module ehci-pci not found in modules.dep* >>> > >>> > *modprobe: module ehci-orion not found in modules.dep* >>> > >>> > *modprobe: module ehci-hcd not found in modules.dep* >>> > >>> > *modprobe: module ohci-hcd not found in modules.dep* >>> > >>> > *Busybox v1.22.1 (Debian 1:1.22.0-9+deb8u1) built-in shell (ash)* >>> > *Enter help for a list of built-in commands* >>> > >>> > >>> > * /bin/sh can't access tty job control turned off * >>> > *(initramfs)* >>> > >>> > Maybe did you ever had this type of problem? >>> >>> Where is your root file system located? >>> If it is on a local disk check your /etc/fstab >>> Maybe the device location has changed with the newer kernel? >>> >>> >>> regards >>> Markus Köberl >>> -- >>> Markus Koeberl >>> Graz University of Technology >>> Signal Processing and Speech Communication Laboratory >>> E-mail: markus.koeb...@tugraz.at >>> >> >> >