I'm running an FC4 system. I was copying some files on to the server this
weekend, and the server locked up hard, and I had to power off. I rebooted the
server, and the array came up fine, but when I tried to fsck the filesystem,
fsck just locked up at about 40%. I left it sitting there for 12 hours, hoping
it was going to come back, but I had to power off the server again. When I now
reboot the server, it is failing to mount my raid5 array..
mdadm: /dev/md0 assembled from 3 drives and 1 spare - not enough to start
the array.
I've added the output from the various files/commands at the bottom...
I am a little confused at the output.. According to /dev/hd[cgh], there is only
1 failed disk in the array, so why does it think that there are 3 failed disks
in the array? It looks like there is only 1 failed disk I got an error from
SMARTD about it when I got the server back into multiuser mode, so I know there
is an issue with the disk (Device: /dev/hde, 8 Offline uncorrectable sectors),
but there are still enough disks to bring up the array, and for the spare disk
to start rebuilding.
I've spent the last couple of days googling around, and I can't seem to find
much on how to recover a failed md arrary. Is there any way to get the array
back and working? Unfortunately I don't have a back up of this array, and I'd
really like to try and get the data back (there are 3 LVM logical volumes on
it).
Thanks very much for any help.
Graham
My /etc/mdadm.conf looks like this
]# cat /etc/mdadm.conf
DEVICE /dev/hd*[a-z]
ARRAY /dev/md0 level=raid5 num-devices=6
UUID=96c7d78a:2113ea58:9dc237f1:79a60ddf
devices=/dev/hdh,/dev/hdg,/dev/hdf,/dev/hde,/dev/hdd,/dev/hdc,/dev/hdb
Looking at /proc/mdstat, I am getting this output
# cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : inactive hdc[0] hdb[6] hdh[5] hdg[4] hdf[3] hde[2] hdd[1]
1378888832 blocks super non-persistent
Here's the output when ran on the device that some think have failed.....
# mdadm -E /dev/hde
/dev/hde:
Magic : a92b4efc
Version : 00.90.02
UUID : 96c7d78a:2113ea58:9dc237f1:79a60ddf
Creation Time : Wed Feb 1 17:10:39 2006
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 0
Update Time : Sun Feb 4 17:29:53 2007
State : active
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : dcab70d - correct
Events : 0.840944
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 2 33 0 2 active sync /dev/hde
0 0 22 0 0 active sync /dev/hdc
1 1 22 64 1 active sync /dev/hdd
2 2 33 0 2 active sync /dev/hde
3 3 33 64 3 active sync /dev/hdf
4 4 34 0 4 active sync /dev/hdg
5 5 34 64 5 active sync /dev/hdh
6 6 3 64 6 spare /dev/hdb
Running an mdadm -E on /dev/hd[bcgh] gives this,
Number Major Minor RaidDevice State
this 6 3 64 6 spare /dev/hdb
0 0 22 0 0 active sync /dev/hdc
1 1 22 64 1 active sync /dev/hdd
2 2 0 0 2 faulty removed
3 3 33 64 3 active sync /dev/hdf
4 4 34 0 4 active sync /dev/hdg
5 5 34 64 5 active sync /dev/hdh
6 6 3 64 6 spare /dev/hdb
And running mdadm -E on /dev/hd[def]
Number Major Minor RaidDevice State
this 3 33 64 3 active sync /dev/hdf
0 0 22 0 0 active sync /dev/hdc
1 1 22 64 1 active sync /dev/hdd
2 2 33 0 2 active sync /dev/hde
3 3 33 64 3 active sync /dev/hdf
4 4 34 0 4 active sync /dev/hdg
5 5 34 64 5 active sync /dev/hdh
6 6 3 64 6 spare /dev/hdb
Looking at /var/log/messages, shows the following
Feb 6 12:36:42 file01bert kernel: md: bind<hdd>
Feb 6 12:36:42 file01bert kernel: md: bind<hde>
Feb 6 12:36:42 file01bert kernel: md: bind<hdf>
Feb 6 12:36:42 file01bert kernel: md: bind<hdg>
Feb 6 12:36:42 file01bert kernel: md: bind<hdh>
Feb 6 12:36:42 file01bert kernel: md: bind<hdb>
Feb 6 12:36:42 file01bert kernel: md: bind<hdc>
Feb 6 12:36:42 file01bert kernel: md: kicking non-fresh hdf from array!
Feb 6 12:36:42 file01bert kernel: md: unbind<hdf>
Feb 6 12:36:42 file01bert kernel: md: export_rdev(hdf)
Feb 6 12:36:42 file01bert kernel: md: kicking non-fresh hde from array!
Feb 6 12:36:42 file01bert kernel: md: unbind<hde>
Feb 6 12:36:42 file01bert kernel: md: export_rdev(hde)
Feb 6 12:36:42 file01bert kernel: md: kicking non-fresh hdd from array!
Feb 6 12:36:42 file01bert kernel: md: unbind<hdd>
Feb 6 12:36:42 file01bert kernel: md: export_rdev(hdd)
Feb 6 12:36:42 file01bert kernel: md: md0: raid array is not clean -- starting
background reconstruction
Feb 6 12:36:42 file01bert kernel: raid5: device hdc operational as raid disk 0
Feb 6 12:36:42 file01bert kernel: raid5: device hdh operational as raid disk 5
Feb 6 12:36:42 file01bert kernel: raid5: device hdg operational as raid disk 4
Feb 6 12:36:42 file01bert kernel: raid5: not enough operational devices for
md0 (3/6 failed)
Feb 6 12:36:42 file01bert kernel: RAID5 conf printout:
Feb 6 12:36:42 file01bert kernel: --- rd:6 wd:3 fd:3
Feb 6 12:36:42 file01bert kernel: disk 0, o:1, dev:hdc
Feb 6 12:36:42 file01bert kernel: disk 4, o:1, dev:hdg
Feb 6 12:36:42 file01bert kernel: disk 5, o:1, dev:hdh
Feb 6 12:36:42 file01bert kernel: raid5: failed to run raid set md0
Feb 6 12:36:42 file01bert kernel: md: pers->run() failed ...
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at
the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html