Package: mdadm Version: Lenny (I think) Hello Wonderful Debian Sages -
I have a RAID1 MDADM array using two identical SSD's. We had lost our VPN to the box but I could still SSH in, and wow what did I find. For awhile I was able to browse around in the file system but files kept slowly turning into ??????'s like the below. -rwxr-xr-x 1 root root 4872 Jan 1 2011 runlevel -????????? ? ? ? ? ? sfdisk -rwxr-xr-x 1 root root 879 Feb 15 2011 shadowconfig -????????? ? ? ? ? ? shorewall -rwxr-xr-x 1 root root 15976 Jan 13 16:07 showmount -rwxr-xr-x 1 root root 23696 Jan 1 2011 shutdown -rwxr-xr-x 1 root root 31728 Mar 16 2009 slattach -rwxr-xr-x 1 root root 44464 Jan 13 16:07 sm-notify And then eventually all storage device commands report an "Input/output error" which presumably means it can't read the hard-drive. It appears that one of the SSD's failed and one stayed up, but the one that was live was slowly getting corrupted or was slowly copying bad sectors off of the failed SSD, slowly corrupting the good drive. This is just a guess what was happening as it was sort of progressive slowly losing access to files right in front of my eyes. However I did check mdadm in the middle of this and it had marked one of the drives as faulty but the degradation of the system kept continuing anyway. I was able to copy a couple critical config files but if anyone knows a trick I might grab the /etc/ directory from this that would be a huge help and will likely save me many hours. I haven't done anything yet but will likely be yanking both SSDs and reverting back to old technology. One thing to be said about the platter drives, when they fail RAID actually works right. Anyone have any hope or a prayer here that might save the day (at least to be able to read the /etc files). Are there any tricks to do this: - Grab the firewall configuration out of memory? I'm using shorewall but I can't access the directory. - Get a list of the interfaces and IP addresses and tunnel configurations. Or best of all any tricks that might get me access to the filesystem here again on EITHER volume? The processes in memory are running some critical functions and amusingly they seem fine (they don't use disk) so things are still running but obviously if I reboot ... Don't ask for logs (unless I can get them out of memory or proc/etc.) as all of the log files are inaccessible. Depressing because I thought using RAID1 here would protect me from these issues. At least the system didn't halt at least. Thanks for any tips/help at all. This machine will be in this state likely for another 24 hours before I rebuild it with non-SSDs. Is there anything else I can retrieve from this machine that may help isolate a non-repeat of this for someone else? ------------------------------------------------------------------------- XXXXXX:/sbin# uname -r 2.6.32-5-amd64 ------------------------------------------------------------------------- XXXXXX:/# ls ls: reading directory .: Input/output error ------------------------------------------------------------------------- XXXXXX:/sbin# mdadm --examine /dev/md0 mdadm: No md superblock detected on /dev/md0. ------------------------------------------------------------------------- XXXXXX:/dev# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Tue Oct 18 19:04:03 2011 Raid Level : raid1 Array Size : 59939768 (57.16 GiB 61.38 GB) Used Dev Size : 59939768 (57.16 GiB 61.38 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Thu May 24 23:50:02 2012 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 0 8 1 - faulty spare /dev/sda1 ------------------------------------------------------------------------- XXXXXX:/dev# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda5[0](F) sdb5[1] 2578420 blocks super 1.2 [2/1] [_U] md0 : active raid1 sda1[0](F) sdb1[1] 59939768 blocks super 1.2 [2/1] [_U] unused devices: <none> -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org