Hello!

I realized I had to go down to the server room to fix this. Been there 
all day. Think I'm catching a cold... ;-)

Just a reminder of what happened:

On Saturday 27 September 2003 15:22, Kjetil Kjernsmo wrote:
> Last night, the cronjob on my main server reported this:
> /etc/cron.daily/mailman:
> /etc/cron.daily/mailman: line 14: 15009 Segmentation fault      su -s
> /bin/sh list -c "/usr/bin/python /var/lib/mailman/cron/checkdbs"

These segfaults started to appear, and I found there were little I could 
do to repair it. Eventually, it seems like more and more programs got 
this problem, and finally I couldn't log on anymore on Sunday evening. 

In the server room, I booted the machine with a Knoppix CD, fscked all 
partitions, but they all appeared to be clean. I then grabbed the logs 
to look through. 

This is the last thing I could find in kern.log from Sunday evening:
Sep 28 21:11:27 pooh kernel:  <1>Unable to handle kernel NULL pointer 
dereference at virtual address 00000004
Sep 28 21:11:27 pooh kernel:  printing eip:
Sep 28 21:11:27 pooh kernel: c013f765
Sep 28 21:11:27 pooh kernel: *pde = 00000000
Sep 28 21:11:27 pooh kernel: Oops: 0002
Sep 28 21:11:27 pooh kernel: CPU:    0
Sep 28 21:11:27 pooh kernel: EIP:    0010:[prune_dcache+21/296]    Not 
tainted
Sep 28 21:11:27 pooh kernel: EFLAGS: 00010212
Sep 28 21:11:27 pooh kernel: eax: 00000000   ebx: c77fc558   ecx: 
00000006   edx: 00000000
Sep 28 21:11:27 pooh kernel: esi: 000001d2   edi: 00000020   ebp: 
0000311f   esp: c1a79e04
Sep 28 21:11:27 pooh kernel: ds: 0018   es: 0018   ss: 0018
Sep 28 21:11:27 pooh kernel: Process postmaster (pid: 13609, 
stackpage=c1a79000)
Sep 28 21:11:27 pooh kernel: Stack: 00000010 000001d2 00000020 00000006 
c013facb 0000311f c0129fd6 00000006
Sep 28 21:11:27 pooh kernel:        000001d2 00000006 000001d2 c01f2f88 
c01f2f88 c01f2f88 c012a02f 00000020
Sep 28 21:11:27 pooh kernel:        c1a78000 00000120 00000000 c012a842 
c01f3104 00000120 00000010 00000000
Sep 28 21:11:27 pooh kernel: Call Trace: [shrink_dcache_memory+27/52] 
[shrink_caches+102/136] [try_to_free_pages+55/88] 
[balance_classzone+78/360] [__alloc_pages+262/356]
Sep 28 21:11:27 pooh kernel:    [_alloc_pages+22/24] 
[do_anonymous_page+48/164] [do_no_page+51/284] [handle_mm_fault+82/180] 
[do_page_fault+352/1164] [do_page_fault+0/1164]
Sep 28 21:11:27 pooh kernel:    [do_brk+280/508] [sys_ipc+144/632] 
[sys_brk+187/228] [error_code+52/60]
Sep 28 21:11:27 pooh kernel:
Sep 28 21:11:27 pooh kernel: Code: 89 50 04 89 02 89 1b 89 5b 04 8d 73 
e8 8b 46 54 a8 08 74 27

The kernel is an unmodified Debian 2.4.18 kernel for i686. It doesn't 
mean anything to me, but perhaps it does to others.

A quick grep through the kern.log reveals that it was mostly python 
processes that died this way (Mailman runs on Python), but 
inetd, sh, exim, modprobe, ps, w, netstat, nfsstat, lsof, I think I got 
them all there. The postmaster-entry above seems to be the final entry, 
and the only for postmaster, but nonetheless seems to be rather 
representative for what happened. 

Now, I've upgraded to a self-compiled 2.4.22 kernel from the unstable 
kernel-source, but since I don't understand what caused this, I'm not 
confident it'll do the trick.

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/        OpenPGP KeyID: 6A6A0BBC


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to