Hi, a little more information to be registered in the bug report.
On Wed, Jun 13, 2012 at 11:26:18PM -0500, Jonathan Nieder wrote: > Hi, > > Jose Manuel dos Santos Calhariz wrote: > > > Hi, did you had time time to look into this? > > Thanks for a reminder. Let's see. > > > - just before the BUG there is a "read error NOT corrected", "Disk > > failure on cciss/c1d3p1, disabling device." and "Operation continuing > > on 5 devices." > > It might be possible to simulate this by hot-unplugging a disk (for > example in a VM). The test system allows hot-unplugging the disks but the error message is a little different. I did not get a "read error NOT corrected!!" neither a BUG. The system just goes on with one disk less and without problems. Jun 18 11:55:08 afs06 kernel: cciss: cmd ffff880037700000 has CHECK CONDITION sense key = 0x3 Jun 18 11:55:08 afs06 kernel: end_request: I/O error, dev cciss/c1d0, sector 9159296 Jun 18 11:55:08 afs06 kernel: cciss: cmd ffff880037700000 has CHECK CONDITION sense key = 0x4 Jun 18 11:55:08 afs06 kernel: end_request: I/O error, dev cciss/c1d0, sector 9159296 Jun 18 11:55:08 afs06 kernel: raid5: Disk failure on cciss/c1d0p1, disabling device. Jun 18 11:55:08 afs06 kernel: raid5: Operation continuing on 5 devices. ... > > [...] > > end_request: I/O error, dev cciss/c1d3, sector 73343280 > > raid5:md2: read error NOT corrected!! (sector 73343248 on cciss/c1d3p1). > > raid5: Disk failure on cciss/c1d3p1, disabling device. > > raid5: Operation continuing on 5 devices. > > raid5:md2: read error NOT corrected!! (sector 73343256 on cciss/c1d3p1). > > raid5:md2: read error NOT corrected!! (sector 73343264 on cciss/c1d3p1). > > raid5:md2: read error NOT corrected!! (sector 73343272 on cciss/c1d3p1). > > raid5:md2: read error NOT corrected!! (sector 73343280 on cciss/c1d3p1). > > raid5:md2: read error NOT corrected!! (sector 73343288 on cciss/c1d3p1). > > ------------[ cut here ]------------ > > kernel BUG at > > /tmp/buildd/linux-2.6-2.6.32/debian/build/source_i386_none/drivers/md/raid5.c:2764! > [...] > > Code: e9 9b 01 00 00 83 7c 24 7c 02 74 04 0f 0b eb fe f6 46 28 10 c7 46 3c > > 00 00 00 00 0f 85 7f 01 00 00 8b 44 24 38 39 44 24 70 7d 04 <0f> 0b eb fe > > 83 7c 24 7c 02 75 20 6b 84 24 a8 00 00 00 78 ff 44 > > /* now write out any block on a failed drive, > * or P or Q if they were recomputed > */ > BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover > */ > > 21: 8b 44 24 38 mov 0x38(%esp),%eax > 25: 39 44 24 70 cmp %eax,0x70(%esp) > 29: 7d 04 jge 0x2f > 2b:* 0f 0b ud2 <-- trapping instruction > > [...] > > EIP: 0060:[<f818c811>] EFLAGS: 00010297 CPU: 3 > > EIP is at handle_stripe+0x89d/0x173e [raid456] > > EAX: 00000005 EBX: 00000002 ECX: 00000003 EDX: 00000001 > > s->uptodate is 0x70(%esp), so presumably disks - 1 is %eax (= 5). > The assertion tripped, meaning that s->uptodate is lower. Stack > doesn't go far enough to let us examine s->uptodate. > > [...] > > ESI: f6394000 EDI: 00000003 EBP: f6394028 ESP: f58d5e6c > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > Process md2_raid6 (pid: 743, ti=f58d4000 task=f6569980 task.ti=f58d4000) > > Stack: > > e6fde3e6 c2988138 00000006 f61c8e00 00000006 0002d995 00020003 00000000 > > <0> c2988138 f4cbc86c f65699ac 000f0e67 00000000 f639431c 00000005 fffffffc > > <0> f4cbc86c c1025461 00000000 00000000 00000002 00000005 00988100 c127a45c > > Call Trace: > > [<c1025461>] ? check_preempt_wakeup+0x196/0x202 > > [<f818d9fb>] ? raid5d+0x349/0x389 [raid456] > > [<c103b623>] ? del_timer_sync+0xa/0x14 > > [<c103b6cb>] ? process_timeout+0x0/0x5 > > [<f816206e>] ? md_thread+0xe1/0xf8 [md_mod] > > [<c104433a>] ? autoremove_wake_function+0x0/0x2d > > [<f8161f8d>] ? md_thread+0x0/0xf8 [md_mod] > > [<c1044108>] ? kthread+0x61/0x66 > > [<c10440a7>] ? kthread+0x0/0x66 > > [<c1003d47>] ? kernel_thread_helper+0x7/0x10 > > Code: e9 9b 01 00 00 83 7c 24 7c 02 74 04 0f 0b eb fe f6 46 28 10 c7 46 3c > > 00 00 00 00 0f 85 7f 01 00 00 8b 44 24 38 39 44 24 70 7d 04 <0f> 0b eb fe > > 83 7c 24 7c 02 75 20 6b 84 24 a8 00 00 00 78 ff 44 > > I'd suggest contacting NeilBrown <ne...@suse.de> and > linux-r...@vger.kernel.org to let them know what happened and ask if > it rings a bell. If doing so, please cc either me or this bug log so > we can track it. I will do that, thank you for your help. > > Hope that helps, > Jonathan > > Jose Calhariz -- -- "Tanto na minha vida futebolística quanto com a minha vida ser humana..." -- Nunes, ex-atacante do Flamengo, em uma entrevista antes do jogo de despedida do Zico
signature.asc
Description: Digital signature