I have a home-use fileserver running Etch and distro-supplied kernels, software etc. It contains 2 x 4 drive raid5 arrays using mdadm. What I initially thought was a samba issue led to a few kernel panics and some "kernel bug" log messages. At first I had the bug messages when running 2.6.18-4 so I updated to 2.6.18-5 where I got different bug messages and the machine degraded to the point whereby it would kernel panic during the boot process. I reverted back to an old 2.6.17 kernel where everything appeared to boot and work fine, my arrays both began to resync. After a couple of hours the first (4 x 320gb) array had completed its resync and I noticed the other array (4 x 500gb) was "stuck" midway through the resync after one of the "kernel bug" log messages had been printed to the console. Any subsequent process trying to query the mounted filesystem went into an uninterpretable sleep and mdadm would not respond to commands citing an I/O error.
I booted into knoppix (livecd) and constructed the array for which dmesg told me a specific drive it was having issues with. I booted into the manufacturers hardware testing util (seagate) and ran the short and extended tests, both which passed fine. I zeroed the drive, booted back into Debian, created a partition on the device and (after some mdadm hiccups) managed to re-add the old drive to the array. During the re-sync I got a kernel panic, then another 2 after rebooting. Finally, I removed the suspect device from the array before it spent too long resyncing and all is well. The degraded array is mounted and working fine. I erased and created an ext3 partition on the suspect drive and data is being copied to it as I type. I don't think this is a hardware problem, issues only happen if I add this drive to the array and leave it to resync for a couple of minutes. In the course of all this I tried a backported kernel and copy of mdadm too. I have run CPU/Memory stability testing programs and I should also note that the machine has been running with no issues for about 8 months. What should I try from here short of purchasing new hardware? I've included some of the different messages from my kernel log if they are of any use. Thanks, Nathan Aug 30 20:22:13 localhost kernel: ------------[ cut here ]------------ Aug 30 20:22:13 localhost kernel: kernel BUG at mm/slab.c:3434! Aug 30 20:22:13 localhost kernel: invalid opcode: 0000 [#1] Aug 30 20:22:13 localhost kernel: Modules linked in: ipv6 button ac battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop evdev snd_mpu401 snd_mp u401_uart snd_rawmidi snd_seq_device snd rtc parport_pc parport serio_raw floppy analog gameport pcspkr soundcore psmouse i2c_nforce2 i2c_core eth1394 ext3 j bd ide_cd cdrom ide_disk sd_mod amd74xx generic ide_core ohci1394 skge ieee1394 sata_sil sata_nv ehci_hcd ohci_hcd forcedeth libata scsi_mod usbcore thermal processor fan Aug 30 20:22:13 localhost kernel: CPU: 0 Aug 30 20:22:13 localhost kernel: EIP: 0060:[<c0146014>] Not tainted VLI Aug 30 20:22:13 localhost kernel: EFLAGS: 00010206 (2.6.18-4-486 #1) Aug 30 20:22:13 localhost kernel: EIP is at kmem_cache_free+0x36/0x62 Aug 30 20:22:13 localhost kernel: eax: 80000080 ebx: d3b50dc0 ecx: dff5a0c0 edx: c1474fe0 Aug 30 20:22:13 localhost kernel: esi: d09d6f74 edi: e3a7f4e4 ebp: f6c6b8c0 esp: f7c85f2c Aug 30 20:22:13 localhost kernel: ds: 007b es: 007b ss: 0068 Aug 30 20:22:13 localhost kernel: Process kjournald (pid: 2269, ti=f7c84000 task=dfbab030 task.ti=f7c84000) Aug 30 20:22:13 localhost kernel: Stack: d3b50dc0 d09d6f74 e3a7f4e4 f8966a9b 00000000 f6ab0800 00000000 00000000 Aug 30 20:22:13 localhost kernel: d048815c f7da83c0 dfbab030 c0360454 f6ab0800 00000000 00000000 00000046 Aug 30 20:22:13 localhost kernel: 00000000 0000000a f7aca030 25df3858 0002b97b 00007e83 dfbab140 f6c6b910 Aug 30 20:22:13 localhost kernel: Call Trace: Aug 30 20:22:13 localhost kernel: [<f8966a9b>] journal_commit_transaction+0x30b/0xc08 [jbd] Aug 30 20:22:13 localhost kernel: [<f8969ca1>] kjournald+0x92/0x184 [jbd] Aug 30 20:22:13 localhost kernel: [<c0122cc3>] autoremove_wake_function+0x0/0x2d Aug 30 20:22:13 localhost kernel: [<f8969c0f>] kjournald+0x0/0x184 [jbd] Aug 30 20:22:13 localhost kernel: [<c0122b64>] kthread+0xaf/0xdb Aug 30 20:22:13 localhost kernel: [<c0122ab5>] kthread+0x0/0xdb Aug 30 20:22:13 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb Aug 30 20:22:13 localhost kernel: Code: 00 40 c1 ea 0c c1 e2 05 03 15 5c b4 36 c0 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 0f 0b 53 02 f9 06 29 c0 39 4a 18 74 08 <0f> 0b 6a 0d f9 06 29 c0 9c 5e fa 8b 19 8b 03 3b 43 04 72 0b 89 Aug 30 20:22:13 localhost kernel: EIP: [<c0146014>] kmem_cache_free+0x36/0x62 SS:ESP 0068:f7c85f2c Aug 30 22:27:51 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014 Aug 30 22:27:51 localhost kernel: printing eip: Aug 30 22:27:51 localhost kernel: f8ba79e0 Aug 30 22:27:51 localhost kernel: *pde = 00000000 Aug 30 22:27:51 localhost kernel: Oops: 0000 [#1] Aug 30 22:27:51 localhost kernel: SMP Aug 30 22:27:51 localhost kernel: Modules linked in: ipv6 button ac battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop analog snd_mpu401 snd_m pu401_uart snd_rawmidi snd_seq_device snd floppy parport_pc parport rtc gameport soundcore serio_raw psmouse pcspkr i2c_nforce2 i2c_core eth1394 evdev ext3 j bd mbcache ide_cd cdrom ide_disk sd_mod generic amd74xx ide_core ohci1394 skge ieee1394 sata_sil ohci_hcd forcedeth sata_nv ehci_hcd libata scsi_mod usbcore thermal processor fan Aug 30 22:27:51 localhost kernel: CPU: 0 Aug 30 22:27:51 localhost kernel: EIP: 0060:[<f8ba79e0>] Not tainted VLI Aug 30 22:27:51 localhost kernel: EFLAGS: 00010202 (2.6.18-5-686 #1) Aug 30 22:27:51 localhost kernel: EIP is at handle_stripe+0x114d/0x2075 [raid456] Aug 30 22:27:51 localhost kernel: eax: 25e61e40 ebx: f71b4e80 ecx: 0000000c edx: 00000000 Aug 30 22:27:51 localhost kernel: esi: f71b4e84 edi: 00000010 ebp: f71b4d78 esp: f756be90 Aug 30 22:27:51 localhost kernel: ds: 007b es: 007b ss: 0068 Aug 30 22:27:51 localhost kernel: Process md0_raid5 (pid: 2273, ti=f756a000 task=dff17aa0 task.ti=f756a000) Aug 30 22:27:51 localhost kernel: Stack: f756beb0 00000040 ffffa138 f71b4e80 c18079a0 c1807980 ffffa138 00000000 Aug 30 22:27:51 localhost kernel: c1909980 00000001 c030cf58 0000000a 00000000 c0121838 00000046 f756bef0 Aug 30 22:27:51 localhost kernel: dffdb550 00000046 00000046 00000032 c01050ea 00803040 f7407900 c01036b6 Aug 30 22:27:51 localhost kernel: Call Trace: Aug 30 22:27:51 localhost kernel: [<c0121838>] __do_softirq+0x5a/0xbb Aug 30 22:27:51 localhost kernel: [<c01050ea>] do_IRQ+0x48/0x52 Aug 30 22:27:51 localhost kernel: [<c01036b6>] common_interrupt+0x1a/0x20 Aug 30 22:27:51 localhost kernel: [<c01af0ae>] generic_unplug_device+0x15/0x22 Aug 30 22:27:51 localhost kernel: [<f8ba8a15>] raid5d+0x10d/0x132 [raid456] Aug 30 22:27:51 localhost kernel: [<f8b77769>] md_thread+0xd7/0xed [md_mod] Aug 30 22:27:51 localhost kernel: [<c012d92d>] autoremove_wake_function+0x0/0x2d Aug 30 22:27:51 localhost kernel: [<f8b77692>] md_thread+0x0/0xed [md_mod] Aug 30 22:27:51 localhost kernel: [<c012d85f>] kthread+0xc2/0xef Aug 30 22:27:51 localhost kernel: [<c012d79d>] kthread+0x0/0xef Aug 30 22:27:51 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb Aug 30 22:27:51 localhost kernel: Code: 11 8b 8c 24 94 00 00 00 89 4f 08 89 bc 24 94 00 00 00 b0 01 8b 7c 24 20 86 87 d0 00 00 00 fb 89 df 85 ff 74 23 8b 46 60 8b 56 64 <8b> 5f 04 8b 0f 83 c0 08 83 d2 00 39 d3 0f 82 68 ff ff ff 77 08 Aug 30 22:27:51 localhost kernel: EIP: [<f8ba79e0>] handle_stripe+0x114d/0x2075 [raid456] SS:ESP 0068:f756be90 Aug 31 00:51:57 localhost kernel: BUG: unable to handle kernel paging request at virtual address 7a95c000 Aug 31 00:51:57 localhost kernel: printing eip: Aug 31 00:51:57 localhost kernel: f8b347ce Aug 31 00:51:57 localhost kernel: *pde = 00000000 Aug 31 00:51:57 localhost kernel: Oops: 0000 [#1] Aug 31 00:51:57 localhost kernel: Modules linked in: ipv6 button ac battery raid456 xor md_mod dm_snapshot dm_mirror dm_mod sbp2 loop snd_mpu401 snd_mpu401_u art snd_rawmidi snd_seq_device snd rtc analog gameport soundcore parport_pc parport serio_raw psmouse floppy pcspkr i2c_nforce2 i2c_core eth1394 evdev ext3 j bd ide_cd cdrom ide_disk sd_mod generic amd74xx ide_core forcedeth ohci1394 skge ieee1394 sata_sil sata_nv ehci_hcd ohci_hcd libata scsi_mod usbcore thermal processor fan Aug 31 00:51:57 localhost kernel: CPU: 0 Aug 31 00:51:57 localhost kernel: EIP: 0060:[<f8b347ce>] Not tainted VLI Aug 31 00:51:57 localhost kernel: EFLAGS: 00010212 (2.6.18-4-486 #1) Aug 31 00:51:57 localhost kernel: EIP is at xor_sse_5+0x5b/0x3b5 [xor] Aug 31 00:51:57 localhost kernel: eax: 00000010 ebx: f6a36000 ecx: f6a39000 edx: 7a95c000 Aug 31 00:51:57 localhost kernel: esi: f6a37000 edi: f6a38000 ebp: f7c01dd8 esp: f7c01dd4 Aug 31 00:51:57 localhost kernel: ds: 007b es: 007b ss: 0068 Aug 31 00:51:57 localhost kernel: Process md1_raid5 (pid: 2223, ti=f7c00000 task=f7ad4ab0 task.ti=f7c00000) Aug 31 00:51:57 localhost kernel: Stack: 8005003b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 31 00:51:57 localhost kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 31 00:51:57 localhost kernel: 00000000 f8b3745c f6a37000 7a95c000 00001000 f8b3521a f6a38000 f6a37000 Aug 31 00:51:57 localhost kernel: Call Trace: Aug 31 00:51:57 localhost kernel: [<f8b3521a>] xor_block+0x74/0x7d [xor] Aug 31 00:51:57 localhost kernel: [<f8b922d0>] compute_parity5+0x311/0x3d6 [raid456] Aug 31 00:51:57 localhost kernel: [<f8b94fc1>] handle_stripe+0x18ee/0x1ebe [raid456] Aug 31 00:51:57 localhost kernel: [<f8834fb2>] scsi_io_completion+0x13f/0x2e9 [scsi_mod] Aug 31 00:51:57 localhost kernel: [<f888bf25>] ata_hsm_move+0x63d/0x653 [libata] Aug 31 00:51:57 localhost kernel: [<f89303c0>] sd_rw_intr+0x1f7/0x221 [sd_mod] Aug 31 00:51:57 localhost kernel: [<c0275964>] schedule+0x46e/0x4d2 Aug 31 00:51:57 localhost kernel: [<f8b9566f>] raid5d+0xde/0xf8 [raid456] Aug 31 00:51:57 localhost kernel: [<f8b6659d>] md_thread+0xd6/0xec [md_mod] Aug 31 00:51:57 localhost kernel: [<c0122cc3>] autoremove_wake_function+0x0/0x2d Aug 31 00:51:57 localhost kernel: [<f8b664c7>] md_thread+0x0/0xec [md_mod] Aug 31 00:51:57 localhost kernel: [<c0122b64>] kthread+0xaf/0xdb Aug 31 00:51:57 localhost kernel: [<c0122ab5>] kthread+0x0/0xdb Aug 31 00:51:57 localhost kernel: [<c0101005>] kernel_thread_helper+0x5/0xb Aug 31 00:51:57 localhost kernel: Code: 5d 30 0f 18 82 00 01 00 00 0f 18 82 20 01 00 00 8d b6 00 00 00 00 8d bc 27 00 00 00 00 0f 18 81 00 01 00 00 0f 18 81 20 01 00 00 <0f> 28 02 0f 28 4a 10 0f 28 52 20 0f 28 5a 30 0f 18 87 00 01 00 Aug 31 00:51:57 localhost kernel: EIP: [<f8b347ce>] xor_sse_5+0x5b/0x3b5 [xor] SS:ESP 0068:f7c01dd4 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]