> If he hasn't rebooted his Win2k box in three years, he has some serious > Security > Issues!! There have been a huge number of security patches requiring reboot > in > the last three years....
Well, yeah. I thought of that after a few months of our little rivalry...and he replied that as it wasn't on the internet...it wasn't an issue. It's only running a small community radio station network with established staff anyway, so I think he'll be fine. > I presume that you've tried using a different kernel to solve your problems? I started out on the 2.4 i386 kernels, then moved on the the 2.4 i586s, and am now on the 2.6 i586s. Panics with all, which suggested to me a hardware problem, but the whole rough two week thing, coupled with the 9 month previous uptime, and the IRQ handler part of things makes me think obscure kernel issue more. The most significant difference between the old setup and the current one is the increased number of uses I've managed as my experience grew. Among other things, I now run OpenLDAP with Samba on the box, and on a number of occasions (but far from the majority), performing a huge file operation (say, unzipping a 500mb zip over the network), causes the panic, but only when I'm due one (> 2 weeks). Works just fine the rest of the time. Maybe just coincidences, but that's pretty much the most I/O strain it ever gets put under. Stuff that just pushes the CPU to ~100% doesn't seem to be a problem. Also, seems my dates were off...our little compo started only 2 years ago, not 3 ;) One of the previous attempts to deal with this was here: http://forums.debian.net/viewtopic.php?t=614 I've also taken the liberty of attaching the last dump I bothered transcribing. Given the prompt for me posting these messages was my most recent panic, it'll be ~2 weeks before I manage to get a more recent one. That said, the magic letters 'IRQ' are always present (IIRC) somewhere in the dump (or at least the bit of it visible on the terminal when I go have a look why things have stopped working...why doesn't it allow you to scroll backwards to get the lot :( ). ...though its just occurred to me, is it possible for the RAM to work fine for ~2 weeks, but then develop bit errors, thus passing the memtesting, but failing in extended normal use due to their age? Running memtest for 2 weeks to test that hypothesis seems a bit...excessive, as I really can't imagine that being the situation.
ksymoops 2.4.9 on i486 2.6.8-2-386. Options used -V (default) -k /proc/kallsyms (specified) -l /proc/modules (default) -o /lib/modules/2.6.8-2-386/ (default) -m /boot/System.map-2.6.8-2-386 (default) Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file? No modules in ksyms, skipping objects No ksyms, skipping lsmod CPU 0 EIP: 0060:[<C01340F9>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 000010002 (2.6.8-2-386) eax: c18cf000 ebx: c3ffa000 ecx: c3ffaae0 edx: ccee5000 esi: c3fff160 edi: 00000028 ebp: 0000003c esp: c0d29f20 ds: 007b es: 007b ss: 0068 Stack: 0000003c c3fff160 c2914e20 c10a4000 c01341d4 c3fff160 c10a4010 0000003c c10a4000 c10a4010 c2914e20 00000282 c013444c c3fff160 c10a4000 c2c1b608 c0347a48 0000000a c028b2f8 c0157788 c2914e20 c3700e70 c0125583 c2c1b65c Call trace: [<c01341d4>] cache_flusharray+0x5e/0x98 [<c013444c>] kfree+0x38/0x48 [<c0157788>] d_callback+0x18/0x29 [<c0125583>] rcv_do_batch+0xf/0x18 [<c011c047>] tasklet_action+0x3a/0x59 [<c011be60>] __do_softirq+0x34/0x73 [<c011bec1>] do_softirq+0x22/0x26 [<c0107ee5>] do_IRQ+0xe5/0xf9 [<c010697c>] common_interrupt+0x18/0x20 Code: 89 02 2b 4b 0c c7 03 00 01 10 00 c7 43 04 00 02 20 00 89 c8 >>EIP; c01340f9 <free_block+3e/bb> <===== >>eax; c18cf000 <__crc_sysfs_create_file+2a20d2/300563> >>ebx; c3ffa000 <__crc_elevator_init+200585/5cd96a> >>ecx; c3ffaae0 <__crc_elevator_init+201065/5cd96a> >>edx; ccee5000 <__crc_xfrm_ealg_get_byname+b10fa/52e343> >>esi; c3fff160 <__crc_elevator_init+2056e5/5cd96a> >>esp; c0d29f20 <__crc_sysfs_remove_file+2ca163/3f4f00> Trace; c01341d4 <cache_flusharray+5e/98> Trace; c013444c <kfree+38/48> Trace; c0157788 <d_callback+18/29> Trace; c0125583 <rcu_do_batch+f/18> Trace; c011c047 <tasklet_action+3a/59> Trace; c011be60 <__do_softirq+34/73> Trace; c011bec1 <do_softirq+22/26> Trace; c0107ee5 <do_IRQ+e5/f9> Trace; c010697c <common_interrupt+18/20> Code; c01340f9 <free_block+3e/bb> 00000000 <_EIP>: Code; c01340f9 <free_block+3e/bb> <===== 0: 89 02 mov %eax,(%edx) <===== Code; c01340fb <free_block+40/bb> 2: 2b 4b 0c sub 0xc(%ebx),%ecx Code; c01340fe <free_block+43/bb> 5: c7 03 00 01 10 00 movl $0x100100,(%ebx) Code; c0134104 <free_block+49/bb> b: c7 43 04 00 02 20 00 movl $0x200200,0x4(%ebx) Code; c013410b <free_block+50/bb> 12: 89 c8 mov %ecx,%eax Kernel Panic Fatal exception in interrupt 1 warning issued. Results may not be reliable.
signature.asc
Description: OpenPGP digital signature