Hello! I realized I had to go down to the server room to fix this. Been there all day. Think I'm catching a cold... ;-)
Just a reminder of what happened: On Saturday 27 September 2003 15:22, Kjetil Kjernsmo wrote: > Last night, the cronjob on my main server reported this: > /etc/cron.daily/mailman: > /etc/cron.daily/mailman: line 14: 15009 Segmentation fault su -s > /bin/sh list -c "/usr/bin/python /var/lib/mailman/cron/checkdbs" These segfaults started to appear, and I found there were little I could do to repair it. Eventually, it seems like more and more programs got this problem, and finally I couldn't log on anymore on Sunday evening. In the server room, I booted the machine with a Knoppix CD, fscked all partitions, but they all appeared to be clean. I then grabbed the logs to look through. This is the last thing I could find in kern.log from Sunday evening: Sep 28 21:11:27 pooh kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004 Sep 28 21:11:27 pooh kernel: printing eip: Sep 28 21:11:27 pooh kernel: c013f765 Sep 28 21:11:27 pooh kernel: *pde = 00000000 Sep 28 21:11:27 pooh kernel: Oops: 0002 Sep 28 21:11:27 pooh kernel: CPU: 0 Sep 28 21:11:27 pooh kernel: EIP: 0010:[prune_dcache+21/296] Not tainted Sep 28 21:11:27 pooh kernel: EFLAGS: 00010212 Sep 28 21:11:27 pooh kernel: eax: 00000000 ebx: c77fc558 ecx: 00000006 edx: 00000000 Sep 28 21:11:27 pooh kernel: esi: 000001d2 edi: 00000020 ebp: 0000311f esp: c1a79e04 Sep 28 21:11:27 pooh kernel: ds: 0018 es: 0018 ss: 0018 Sep 28 21:11:27 pooh kernel: Process postmaster (pid: 13609, stackpage=c1a79000) Sep 28 21:11:27 pooh kernel: Stack: 00000010 000001d2 00000020 00000006 c013facb 0000311f c0129fd6 00000006 Sep 28 21:11:27 pooh kernel: 000001d2 00000006 000001d2 c01f2f88 c01f2f88 c01f2f88 c012a02f 00000020 Sep 28 21:11:27 pooh kernel: c1a78000 00000120 00000000 c012a842 c01f3104 00000120 00000010 00000000 Sep 28 21:11:27 pooh kernel: Call Trace: [shrink_dcache_memory+27/52] [shrink_caches+102/136] [try_to_free_pages+55/88] [balance_classzone+78/360] [__alloc_pages+262/356] Sep 28 21:11:27 pooh kernel: [_alloc_pages+22/24] [do_anonymous_page+48/164] [do_no_page+51/284] [handle_mm_fault+82/180] [do_page_fault+352/1164] [do_page_fault+0/1164] Sep 28 21:11:27 pooh kernel: [do_brk+280/508] [sys_ipc+144/632] [sys_brk+187/228] [error_code+52/60] Sep 28 21:11:27 pooh kernel: Sep 28 21:11:27 pooh kernel: Code: 89 50 04 89 02 89 1b 89 5b 04 8d 73 e8 8b 46 54 a8 08 74 27 The kernel is an unmodified Debian 2.4.18 kernel for i686. It doesn't mean anything to me, but perhaps it does to others. A quick grep through the kern.log reveals that it was mostly python processes that died this way (Mailman runs on Python), but inetd, sh, exim, modprobe, ps, w, netstat, nfsstat, lsof, I think I got them all there. The postmaster-entry above seems to be the final entry, and the only for postmaster, but nonetheless seems to be rather representative for what happened. Now, I've upgraded to a self-compiled 2.4.22 kernel from the unstable kernel-source, but since I don't understand what caused this, I'm not confident it'll do the trick. Cheers, Kjetil -- Kjetil Kjernsmo Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Homepage: http://www.kjetil.kjernsmo.net/ OpenPGP KeyID: 6A6A0BBC -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]