Hendrik Boom wrote: > I ran > mdadm /dev/md1 --add /dev/sdd2 > and got a segmentation fault.
Ouch. Scary! > april:/farhome/hendrik# cat /proc/mdstat > Personalities : [raid1] > md1 : active raid1 sdb2[1] > 2391295864 blocks super 1.2 [2/1] [_U] > > md0 : active raid1 sda4[0] sdc4[1] > 706337792 blocks [2/2] [UU] > > unused devices: <none> > april:/farhome/hendrik# mdadm /dev/md1 --add /dev/sdd2 > Segmentation fault > april:/farhome/hendrik# I read the subsequent email responses but I think they went the wrong direction. The segfault was in mdadm not the disk. It isn't the disk with the problem. The problem is with mdadm. The solution is therefore to find and fix mdadm not the disk. Or it is in a library loaded by mdadm. But there are only three and they are used by every program. $ ldd -d -r /sbin/mdadm linux-vdso.so.1 (0x00007fffe6bb4000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f78df260000) /lib64/ld-linux-x86-64.so.2 (0x00007f78df632000) I suspect that you have one of several possibilities. Some type of file system corruption leading to a problem with the mdadm binary. I would checksum the package and see if it points to something. If you are lucky it will and then will know that is it. # debsums mdadm It could also be some type of api mismatch between versions of program, libs, kernel system calls. I don't know. I am reaching on this one. First I would make sure your system is up to date all around. You said you used Wheezy. I would verify that you are up to date all around. I have seen people think they were up to date but forgot to run 'update' first and so they were actually not. I have seen people have failures with the upgrade but not notice the failure and so actually had broken packages and did not know it. apt-get update apt-get upgrade apt-get dist-upgrade You might try re-installing just the mdadm package. apt-get install --reinstall mdadm > What now? I strongly suspect a broken system. Because mdadm on Wheezy is working fine for zillions of other people. If it were a bug in mdadm then I suspect that it would have been hit by many others. It isn't. So I suspect something uniquely wrong on your system. That is why I think you should start by trying to figure out what specifically about your system is going on and fixing it there. If I could not fix the problem by any other means then two more difficult options would be available. I would shutdown and remove the disks from the faulty system and mount them on a different known good system and then use the other working mdadm and fix the disk problem. This would actually be a good test of something else too. If the problem followed to the known good system then it is clearly a data dependent bug in mdadm. If not then it is a broken system in some way. Afterward you could move the disks back to the original machine. Since the raid had been sync'd then the raid back on the original machine should also boot up sync'd. That would not really address the mdadm segfault problem. However you might not care at that point. Not unless some other problem pops up. The next thing would be to get the source to mdadm and compile it locally on the system. The step through the program in the debugger. While running in the debugger the segfault will be trapped and you should be able to see what part of the code is triggering the problem. # apt-get build-dep mdadm $ apt-get source mdadm $ cd mdadm-3.2.5 $ ./debian/rules build $ ./mdadm --version mdadm - v3.2.5 - 18th May 2012 Do the build-dep as root. Do the rest as yourself, non-root. But then to run the debugger is a very long howto that will vary depending upon many things. I run gdb within emacs. And for mdadm it all needs to be run as root to have the right access. After that I must leave it there. But debugging the program should allow you to figure out what is actually segfaulting. If it is a program bug then it could be fixed and reported. Bob
signature.asc
Description: Digital signature