Thanks for the feedback.

After copying /boot and /bin from another machine and mucking about with grub for far too long (had to edit grub.conf to change virtual disk names, and in CentOS's rescue disk it saw the boot disk as hd1, but when grub actually started, it saw it as hd0) the system is back on line.

The logs don't show a root command line that specifically took out those directories. They do show a bunch of scripts being run. My best guess is that one of them did something like this:

  AVAR=`command that failed and returned an empty string`
  rm -rf ${AVAR}/b*

It seems unlikely that a low level controller failure would have snipped out those files/directories without resulting in a file system that was seen as corrupt by fsck.

That said, there is something hardware related going on, since /var/log/messages has a lot of these (sorry about the wrap):

Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb] Sense Key : Recovered Error [current] [descriptor] Mar 16 12:37:27 mandolin kernel: Descriptor sense data with sense descriptors (in hex): Mar 16 12:37:27 mandolin kernel: 72 01 04 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
Mar 16 12:37:27 mandolin kernel:        00 4f 00 c2 40 50
Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  ASC=0x4 ASCQ=0x1d

That group has several other similar Dell servers, and this is the only one logging these. sdb1 holds /boot and sdb2 is where the lvm keeps its information.

Regards,

David Mathog
[email protected]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to