Hi all,

well, hard to explain, but recently I encounter *lots* of crashes on 
one mashine serving via vserver-sources with about 3-5 VEs.

The physical server itself is a 10k IBM amd64 host with 2 dual cores 
and 16GB RAM, raid10 SATA, just if you want to know.

However, this is what I found in the logs:


Sep 11 20:05:11 jupjep ------------[ cut here ]------------
Sep 11 20:05:11 jupjep kernel BUG at kernel/vserver/context.c:193!
Sep 11 20:05:11 jupjep invalid opcode: 0000 [1] SMP
Sep 11 20:05:11 jupjep CPU 2
Sep 11 20:05:11 jupjep Modules linked in: iptable_nat nf_nat iptable_filter 
ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 
nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
Sep 11 20:05:11 jupjep Pid: 26337:#242, comm: sshd Not tainted 
2.6.20-vs2.3.0.11-gentoo #1
Sep 11 20:05:11 jupjep RIP: 0010:[<ffffffff8109a24b>]  [<ffffffff8109a24b>] 
free_vx_info+0xf/0x8d
Sep 11 20:05:11 jupjep RSP: 0018:ffff8101363a1dd8  EFLAGS: 00010246
Sep 11 20:05:11 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 
0000000000000000
Sep 11 20:05:11 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: 
ffff81040ac6a000
Sep 11 20:05:11 jupjep RBP: 0000000000000000 R08: ffff8103ff3bcc58 R09: 
ffffffff00000000
Sep 11 20:05:11 jupjep R10: 0000000000000000 R11: ffffffff8132a2e2 R12: 
0000000000000000
Sep 11 20:05:11 jupjep R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000001
Sep 11 20:05:11 jupjep FS:  00002b988b369e70(0000) GS:ffff8104118783c0(0000) 
knlGS:0000000056502b90
Sep 11 20:05:11 jupjep CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 11 20:05:11 jupjep CR2: 00002b988c005f80 CR3: 0000000001001000 CR4: 
00000000000006e0
Sep 11 20:05:11 jupjep Process sshd (pid: 26337[#242], threadinfo 
ffff8101363a0000, task ffff810302a70040)
Sep 11 20:05:11 jupjep Stack:  ffff8103ff3bcf40 ffffffff8132c56c 
0000000000000000 ffff8103ff3bcf40
Sep 11 20:05:11 jupjep ffff8103ff3bc9c0 ffffffff8104ff06 ffff8103ff3bc9c0 
0000000000000000
Sep 11 20:05:11 jupjep ffff8103ca88db80 ffff8103ca88dbd0 ffff8101051924a8 
ffff81041183b980
Sep 11 20:05:11 jupjep Call Trace:
Sep 11 20:05:11 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133
Sep 11 20:05:11 jupjep [<ffffffff8104ff06>] unix_release_sock+0x172/0x202
Sep 11 20:05:11 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72
Sep 11 20:05:11 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30
Sep 11 20:05:11 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a
Sep 11 20:05:11 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65
Sep 11 20:05:11 jupjep [<ffffffff810381d6>] put_files_struct+0x66/0xe1
Sep 11 20:05:11 jupjep [<ffffffff81015452>] do_exit+0x264/0x8de
Sep 11 20:05:11 jupjep [<ffffffff81047aec>] cpuset_exit+0x0/0x6b
Sep 11 20:05:11 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
Sep 11 20:05:11 jupjep
Sep 11 20:05:11 jupjep
Sep 11 20:05:11 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 
00 74 04
Sep 11 20:05:11 jupjep RIP  [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
Sep 11 20:05:11 jupjep RSP <ffff8101363a1dd8>
Sep 11 20:05:11 jupjep <1>Fixing recursive fault but reboot is needed!

just a moment later:

Sep 11 20:10:01 jupjep ------------[ cut here ]------------
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
Sep 11 20:10:01 jupjep invalid opcode: 0000 [2] SMP
Sep 11 20:10:01 jupjep CPU 3
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter 
ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 
nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
Sep 11 20:10:01 jupjep Pid: 9762:#242, comm: run-crons Not tainted 
2.6.20-vs2.3.0.11-gentoo #1
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>]  [<ffffffff8109a24b>] 
free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP: 0018:ffff81021d825de8  EFLAGS: 00010246
Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: 
ffff81041e893290
Sep 11 20:10:01 jupjep RDX: ffff81041d582d48 RSI: 0000000000000000 RDI: 
ffff81040ac6a000
Sep 11 20:10:01 jupjep RBP: ffff8103b4e9e000 R08: ffff81021d824000 R09: 
00000000019f865c
Sep 11 20:10:01 jupjep R10: 0000000000000080 R11: ffff810001760400 R12: 
ffff81040fc4bac0
Sep 11 20:10:01 jupjep R13: ffff81040fc4bac0 R14: ffff81000175f000 R15: 
0000000000000000
Sep 11 20:10:01 jupjep FS:  00002ac6eb29fda0(0000) GS:ffff8104118a7ac0(0000) 
knlGS:0000000056502b90
Sep 11 20:10:01 jupjep CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 11 20:10:01 jupjep CR2: 00002ae879267570 CR3: 000000032a8bf000 CR4: 
00000000000006e0
Sep 11 20:10:01 jupjep Process run-crons (pid: 9762[#242], threadinfo 
ffff81021d824000, task ffff810409954100)
Sep 11 20:10:01 jupjep Stack:  ffff81041e893290 ffffffff81086c39 
0000000000000100 ffff81021d825ed8
Sep 11 20:10:01 jupjep 0000000000000003 ffffffff810626dd 0000000000000000 
ffffffff81022504
Sep 11 20:10:01 jupjep ffff8103e4c38978 ffff81040981f880 0000000000000006 
ffff810409954100
Sep 11 20:10:01 jupjep Call Trace:
Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3
Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118
Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a
Sep 11 20:10:01 jupjep [<ffffffff81028613>] do_wait+0x978/0xa78
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
Sep 11 20:10:01 jupjep
Sep 11 20:10:01 jupjep


Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 
00 74 04
Sep 11 20:10:01 jupjep RIP  [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP <ffff81021d825de8>
Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
Sep 11 20:10:01 jupjep invalid opcode: 0000 [3] SMP
Sep 11 20:10:01 jupjep CPU 2
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter 
ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 
nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
Sep 11 20:10:01 jupjep Pid: 8581:#260, comm: server_linux Not tainted 
2.6.20-vs2.3.0.11-gentoo #1
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>]  [<ffffffff8109a24b>] 
free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP: 0018:ffff8103ef0a5a48  EFLAGS: 00210246
Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: 
ffff81041ca4e9c8
Sep 11 20:10:01 jupjep RDX: ffff81041d911838 RSI: 0000000000000000 RDI: 
ffff81040ac6a000
Sep 11 20:10:01 jupjep RBP: ffff81032a8bf000 R08: ffff8103ef0a4000 R09: 
ffff81021d825e88
Sep 11 20:10:01 jupjep R10: 0000000000002623 R11: 00000000ffffffff R12: 
ffff81040981f880
Sep 11 20:10:01 jupjep R13: ffff81040981f880 R14: ffff810001755d00 R15: 
ffffffff815eaeb0
Sep 11 20:10:01 jupjep FS:  00002ac6eb29fda0(0000) GS:ffff8104118783c0(0063) 
knlGS:00000000558f9b90
Sep 11 20:10:01 jupjep CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Sep 11 20:10:01 jupjep CR2: 00000000005c5818 CR3: 00000003ef7d9000 CR4: 
00000000000006e0
Sep 11 20:10:01 jupjep Process server_linux (pid: 8581[#260], threadinfo 
ffff8103ef0a4000, task ffff810409171040)
Sep 11 20:10:01 jupjep Stack:  ffff81041ca4e9c8 ffffffff81086c39 
0000000000000000 ffff8103ef0a5b38
Sep 11 20:10:01 jupjep 0000000000000002 ffffffff810626dd 0000000000000000 
0000000000000000
Sep 11 20:10:01 jupjep 0000000000200246 ffff8103f3199280 000000000000000a 
ffff810409171040
Sep 11 20:10:01 jupjep Call Trace:
Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3
Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118
Sep 11 20:10:01 jupjep [<ffffffff8106301e>] schedule_timeout+0x8a/0xad
Sep 11 20:10:01 jupjep [<ffffffff8108d5c6>] process_timeout+0x0/0x5
Sep 11 20:10:01 jupjep [<ffffffff8102ed6f>] do_sys_poll+0x278/0x360
Sep 11 20:10:01 jupjep [<ffffffff8101e5f5>] __pollwait+0x0/0xe2
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
Sep 11 20:10:01 jupjep [<ffffffff81067b68>] __switch_to+0x26e/0x27d
Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118
Sep 11 20:10:01 jupjep [<ffffffff81034d06>] find_extend_vma+0x16/0x59
Sep 11 20:10:01 jupjep [<ffffffff810a1354>] get_futex_key+0x47/0x10c
Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a
Sep 11 20:10:01 jupjep [<ffffffff810a180c>] futex_wake+0xc6/0xd5
Sep 11 20:10:01 jupjep [<ffffffff8103dd0e>] do_futex+0x268/0xc16
Sep 11 20:10:01 jupjep [<ffffffff8100aec3>] do_page_fault+0x45e/0x7b9
Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe
Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118
Sep 11 20:10:01 jupjep [<ffffffff810a27ff>] compat_sys_futex+0xfb/0x119
Sep 11 20:10:01 jupjep [<ffffffff8104aa34>] sys_poll+0x54/0x5a
Sep 11 20:10:01 jupjep [<ffffffff81060b44>] cstar_do_call+0x1b/0x65
Sep 11 20:10:01 jupjep
Sep 11 20:10:01 jupjep
Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 
00 74 04
Sep 11 20:10:01 jupjep RIP  [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP <ffff8103ef0a5a48>


Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------
Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193!
Sep 11 20:10:01 jupjep invalid opcode: 0000 [4] SMP
Sep 11 20:10:01 jupjep CPU 3
Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter 
ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 
nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem
Sep 11 20:10:01 jupjep Pid: 9764:#242, comm: sendmail Not tainted 
2.6.20-vs2.3.0.11-gentoo #1
Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>]  [<ffffffff8109a24b>] 
free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP: 0018:ffff8103b35dbe88  EFLAGS: 00010246
Sep 11 20:10:01 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 
0000000000000000
Sep 11 20:10:01 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: 
ffff81040ac6a000
Sep 11 20:10:01 jupjep RBP: 0000000000000000 R08: ffffffff815a8718 R09: 
ffffffff00000000
Sep 11 20:10:01 jupjep R10: 0000000000000296 R11: 0000000000000202 R12: 
ffff8102e3414080
Sep 11 20:10:01 jupjep R13: ffff8101c52115f8 R14: ffff81041183b980 R15: 
0000000000002624
Sep 11 20:10:01 jupjep FS:  00002b8330167ae0(0000) GS:ffff8104118a7ac0(0000) 
knlGS:0000000056502b90
Sep 11 20:10:01 jupjep CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 11 20:10:01 jupjep CR2: 00002b832fc91db0 CR3: 000000036083c000 CR4: 
00000000000006e0
Sep 11 20:10:01 jupjep Process sendmail (pid: 9764[#242], threadinfo 
ffff8103b35da000, task ffff810339081790)
Sep 11 20:10:01 jupjep Stack:  ffff81040f8af800 ffffffff8132c56c 
ffff8102e3414080 ffff81040f8af960
Sep 11 20:10:01 jupjep ffff81040f8af800 ffffffff813458ba 0000000000002624 
ffffffff81053352
Sep 11 20:10:01 jupjep 0000000000000000 ffff8102e3414080 ffff8102e34140d0 
ffffffff810531d5
Sep 11 20:10:01 jupjep Call Trace:
Sep 11 20:10:01 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133
Sep 11 20:10:01 jupjep [<ffffffff813458ba>] netlink_release+0x255/0x25f
Sep 11 20:10:01 jupjep [<ffffffff81053352>] sock_fasync+0x124/0x133
Sep 11 20:10:01 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72
Sep 11 20:10:01 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30
Sep 11 20:10:01 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a
Sep 11 20:10:01 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65
Sep 11 20:10:01 jupjep [<ffffffff8101da52>] sys_close+0x8c/0xcf
Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83
Sep 11 20:10:01 jupjep
Sep 11 20:10:01 jupjep
Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 
00 74 04
Sep 11 20:10:01 jupjep RIP  [<ffffffff8109a24b>] free_vx_info+0xf/0x8d
Sep 11 20:10:01 jupjep RSP <ffff8103b35dbe88>

a little later the server crashed.

Now, since i've set sysctl kernel.panic=5 i don't see any of these logs nor 
crashes, but dozens of reboots

jupjep log # last | grep -E '(boot|crash)'
reboot   system boot  Tue Sep 18 11:27          (01:24)     
2.6.22-vs2.3.0.17-gentoo
reboot   system boot  Mon Sep 17 22:23          (14:29)     
2.6.22-vs2.3.0.17-gentoo
reboot   system boot  Mon Sep 17 19:52          (16:59)     
2.6.22-vs2.3.0.17-gentoo
reboot   system boot  Sun Sep 16 18:57         (1+17:55)    
2.6.22-vs2.3.0.17-gentoo
trapni   pts/0        Sat Sep 15 23:58 - crash  (18:58)     $MY_IP
reboot   system boot  Fri Sep 14 22:06         (3+14:46)    
2.6.22-vs2.3.0.17-gentoo
trapni   pts/4        Fri Sep 14 13:39 - crash  (08:27)     $MY_IP
reboot   system boot  Thu Sep 13 19:34         (4+17:18)    
2.6.20-vs2.3.0.11-gentoo
trapni   pts/6        Thu Sep 13 12:38 - crash  (06:56)     $MY_IP
trapni   pts/0        Thu Sep 13 08:17 - crash  (11:16)     $MY_IP
reboot   system boot  Thu Sep 13 08:02         (5+04:50)    
2.6.20-vs2.3.0.11-gentoo
reboot   system boot  Tue Sep 11 19:32         (6+17:19)    
2.6.20-vs2.3.0.11-gentoo
reboot   system boot  Mon Sep 10 17:52         (7+18:59)    
2.6.20-vs2.3.0.11-gentoo
reboot   system boot  Thu Sep  6 16:50         (11+20:02)   
2.6.20-vs2.3.0.11-gentoo
reboot   system boot  Fri Aug  3 08:36         (46+04:16)   
2.6.20-vs2.3.0.11-gentoo
reboot   system boot  Mon Jul 30 21:31         (3+11:01)    
2.6.20-vs2.2.0-gentoo
trapni   pts/13       Mon Jul 30 12:26 - crash  (09:05)     $MY_IP

These system boots were not caused by me, so these were all crashes.

Well now, is there a way to trace this bug and/or to work around?
In fact, we've about 10+ mashines of the very same hardware running gentoo 
hardened profile and a hardened-sources kernel.
But this one host running normal gentoo with vserver-sources really fails get
get me friendly.

Can anybody give me a hint regarding these traces I posted above?

Many thanks in advance,
Christian Parpart.







-- 
[EMAIL PROTECTED] mailing list

Reply via email to