On Fri, 22 Mar 2013 09:26:00 +0400 Michael Tokarev <m...@tls.msk.ru> wrote:
> [Replying back to the bugreport AND to NeilB. > Hope you're okay with that > Neilb: This is just the (excellent!) analisys of the problem, > I can send a patch if you like > ] > > 20.03.2013 22:19, Francesco Potortì wrote: > >> And also (as root) apt-get install build-essential to > >> install build dependencies > > > > It misses ansidecl.h. I installed gcc-4.7-plugin-dev and copied it from > > the standard palce to the build directory to satisfy it. > > Interesting. Thank you for letting us know! > > >> Use this executable to manage your arrays and to get a > >> coredump, and run gdb on it. Or run this executable under > >> gdb and get a sigsegv. > > > > By mistake, I first built version 3.1.4 which does not crash but rather > > gives the correct error message: > > > > mdadm: failed to write superblock to /dev/sdb1 > > > >> When you hit the issue, run `bt' in gdb to see where it > >> is failing. This will show you a stack trace, and the > >> current line of code where it fails. We may want to > >> examine variables around there, using `p' command. > > > > Okay, that was easy, but I do not understand it. THe problem is in > > write_init_super1 in super1.c: > > > > for (di = st->info; di && ! rv ; di = di->next) { > > if (di->disk.state == 1) > > continue; > > if (di->fd < 0) > > continue; > > > > This reads the structure, the second test passes, so the loop continues, > > but next is null and the loop ends. After this, di in null. But in > > this case: > > > > } > > error_out: > > if (rv) > > fprintf(stderr, Name ": Failed to write metadata to %s\n", > > di->devname); > > > > Which segsevs because rv is 4. The fact is, I cannot imagine why ever > > it is 4. It should be 0. > > > > Today I have not had the time to change the disk, so I could do some > > other test. Maybe this evening. If you write to me, I'll try something > > else. > > Well. This should be all that's needed, actually even more than that! > Your analisys is excellent, you did a very good work! Thank you very much > for helping Francesco! > > Obviously these are places difficult to hit in real life... > > This is only the error reporting which is broken, -- mdadm will not eat > your data with this bug. So there's nothing to worry about on your > system anymore, except, ofcourse, the bad disk which needs to be > replaced to restore redundancy, as you already know. Hopefully > the next upload of mdadm package will fix this issue, but it is > not very urgent - SIGSEGV'ing isn't nice but it isn't harmful either. > > Thank you for the good work! > > /mjt > > > tucano:/usr/local/src/mdadm/mdadm-3.2.5# gdb --args ./mdadm /dev/md3 --add > > /dev/sdb1 > > GNU gdb (GDB) 7.4.1-debian > > Copyright (C) 2012 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later > > <http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-linux-gnu". > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>... > > Reading symbols from /usr/local/src/mdadm/mdadm-3.2.5/mdadm...done. > > (gdb) run > > Starting program: /usr/local/src/mdadm/mdadm-3.2.5/mdadm /dev/md3 --add > > /dev/sdb1 > > > > Program received signal SIGSEGV, Segmentation fault. > > write_init_super1 (st=0x69b630) at super1.c:1248 > > 1248 fprintf(stderr, Name ": Failed to write > > metadata to %s\n", > > (gdb) bt > > #0 write_init_super1 (st=0x69b630) at super1.c:1248 > > #1 0x0000000000414600 in Manage_subdevs (devname=0x7fffffffebf3 > > "/dev/md3", fd=8, devlist=0x698030, verbose=0, > > test=0, update=0x0, force=0) at Manage.c:952 > > #2 0x00000000004060fb in main (argc=4, argv=0x7fffffffe8f8) at mdadm.c:1245 > > (gdb) p di > > $1 = (struct devinfo *) 0x0 > > (gdb) p rv > > $2 = 4 > > (gdb) p st > > $3 = (struct supertype *) 0x69b630 > > (gdb) p *st > > $4 = {ss = 0x683cc0, minor_version = 2, max_devs = 1920, container_dev = > > 8388608, sb = 0x6ae000, info = 0x6ad7f0, > > ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, > > sock = 0, devnum = 3, devname = 0x0, > > devcnt = 0, retry_soon = 0, devs = 0x0} > > (gdb) p st->info > > $5 = (void *) 0x6ad7f0 > > (gdb) p *(struct devinfo *)(st->info) > > $6 = {fd = -1, devname = 0x7fffffffec02 "/dev/sdb1", disk = {number = 2, > > major = 8, minor = 17, raid_disk = -1, > > state = 0}, next = 0x0} > > (gdb) quit > > A debugging session is active. > > > > Inferior 1 [process 6824] will be killed. > > > > Quit anyway? (y or n) y > > tucano:/usr/local/src/mdadm/mdadm-3.2.5# > > Thanks for the report. This is fixed by commit 4687f160276a8f7815675ca758c598d881f04fd7 in mainline and by commit 0d478e243a90a48fe4da581c7302771f0d66fb3b in the mdadm-3.2.x branch and thus in mdadm-3.2.6. NeilBrown
signature.asc
Description: PGP signature