Hi all, well, hard to explain, but recently I encounter *lots* of crashes on one mashine serving via vserver-sources with about 3-5 VEs.
The physical server itself is a 10k IBM amd64 host with 2 dual cores and 16GB RAM, raid10 SATA, just if you want to know. However, this is what I found in the logs: Sep 11 20:05:11 jupjep ------------[ cut here ]------------ Sep 11 20:05:11 jupjep kernel BUG at kernel/vserver/context.c:193! Sep 11 20:05:11 jupjep invalid opcode: 0000 [1] SMP Sep 11 20:05:11 jupjep CPU 2 Sep 11 20:05:11 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem Sep 11 20:05:11 jupjep Pid: 26337:#242, comm: sshd Not tainted 2.6.20-vs2.3.0.11-gentoo #1 Sep 11 20:05:11 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:05:11 jupjep RSP: 0018:ffff8101363a1dd8 EFLAGS: 00010246 Sep 11 20:05:11 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000 Sep 11 20:05:11 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000 Sep 11 20:05:11 jupjep RBP: 0000000000000000 R08: ffff8103ff3bcc58 R09: ffffffff00000000 Sep 11 20:05:11 jupjep R10: 0000000000000000 R11: ffffffff8132a2e2 R12: 0000000000000000 Sep 11 20:05:11 jupjep R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 Sep 11 20:05:11 jupjep FS: 00002b988b369e70(0000) GS:ffff8104118783c0(0000) knlGS:0000000056502b90 Sep 11 20:05:11 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 11 20:05:11 jupjep CR2: 00002b988c005f80 CR3: 0000000001001000 CR4: 00000000000006e0 Sep 11 20:05:11 jupjep Process sshd (pid: 26337[#242], threadinfo ffff8101363a0000, task ffff810302a70040) Sep 11 20:05:11 jupjep Stack: ffff8103ff3bcf40 ffffffff8132c56c 0000000000000000 ffff8103ff3bcf40 Sep 11 20:05:11 jupjep ffff8103ff3bc9c0 ffffffff8104ff06 ffff8103ff3bc9c0 0000000000000000 Sep 11 20:05:11 jupjep ffff8103ca88db80 ffff8103ca88dbd0 ffff8101051924a8 ffff81041183b980 Sep 11 20:05:11 jupjep Call Trace: Sep 11 20:05:11 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133 Sep 11 20:05:11 jupjep [<ffffffff8104ff06>] unix_release_sock+0x172/0x202 Sep 11 20:05:11 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72 Sep 11 20:05:11 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30 Sep 11 20:05:11 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a Sep 11 20:05:11 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65 Sep 11 20:05:11 jupjep [<ffffffff810381d6>] put_files_struct+0x66/0xe1 Sep 11 20:05:11 jupjep [<ffffffff81015452>] do_exit+0x264/0x8de Sep 11 20:05:11 jupjep [<ffffffff81047aec>] cpuset_exit+0x0/0x6b Sep 11 20:05:11 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 Sep 11 20:05:11 jupjep Sep 11 20:05:11 jupjep Sep 11 20:05:11 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 Sep 11 20:05:11 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:05:11 jupjep RSP <ffff8101363a1dd8> Sep 11 20:05:11 jupjep <1>Fixing recursive fault but reboot is needed! just a moment later: Sep 11 20:10:01 jupjep ------------[ cut here ]------------ Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! Sep 11 20:10:01 jupjep invalid opcode: 0000 [2] SMP Sep 11 20:10:01 jupjep CPU 3 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem Sep 11 20:10:01 jupjep Pid: 9762:#242, comm: run-crons Not tainted 2.6.20-vs2.3.0.11-gentoo #1 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP: 0018:ffff81021d825de8 EFLAGS: 00010246 Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041e893290 Sep 11 20:10:01 jupjep RDX: ffff81041d582d48 RSI: 0000000000000000 RDI: ffff81040ac6a000 Sep 11 20:10:01 jupjep RBP: ffff8103b4e9e000 R08: ffff81021d824000 R09: 00000000019f865c Sep 11 20:10:01 jupjep R10: 0000000000000080 R11: ffff810001760400 R12: ffff81040fc4bac0 Sep 11 20:10:01 jupjep R13: ffff81040fc4bac0 R14: ffff81000175f000 R15: 0000000000000000 Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90 Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 11 20:10:01 jupjep CR2: 00002ae879267570 CR3: 000000032a8bf000 CR4: 00000000000006e0 Sep 11 20:10:01 jupjep Process run-crons (pid: 9762[#242], threadinfo ffff81021d824000, task ffff810409954100) Sep 11 20:10:01 jupjep Stack: ffff81041e893290 ffffffff81086c39 0000000000000100 ffff81021d825ed8 Sep 11 20:10:01 jupjep 0000000000000003 ffffffff810626dd 0000000000000000 ffffffff81022504 Sep 11 20:10:01 jupjep ffff8103e4c38978 ffff81040981f880 0000000000000006 ffff810409954100 Sep 11 20:10:01 jupjep Call Trace: Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3 Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118 Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a Sep 11 20:10:01 jupjep [<ffffffff81028613>] do_wait+0x978/0xa78 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP <ffff81021d825de8> Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------ Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! Sep 11 20:10:01 jupjep invalid opcode: 0000 [3] SMP Sep 11 20:10:01 jupjep CPU 2 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem Sep 11 20:10:01 jupjep Pid: 8581:#260, comm: server_linux Not tainted 2.6.20-vs2.3.0.11-gentoo #1 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP: 0018:ffff8103ef0a5a48 EFLAGS: 00210246 Sep 11 20:10:01 jupjep RAX: 0000000000000001 RBX: ffff81040ac6a000 RCX: ffff81041ca4e9c8 Sep 11 20:10:01 jupjep RDX: ffff81041d911838 RSI: 0000000000000000 RDI: ffff81040ac6a000 Sep 11 20:10:01 jupjep RBP: ffff81032a8bf000 R08: ffff8103ef0a4000 R09: ffff81021d825e88 Sep 11 20:10:01 jupjep R10: 0000000000002623 R11: 00000000ffffffff R12: ffff81040981f880 Sep 11 20:10:01 jupjep R13: ffff81040981f880 R14: ffff810001755d00 R15: ffffffff815eaeb0 Sep 11 20:10:01 jupjep FS: 00002ac6eb29fda0(0000) GS:ffff8104118783c0(0063) knlGS:00000000558f9b90 Sep 11 20:10:01 jupjep CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 Sep 11 20:10:01 jupjep CR2: 00000000005c5818 CR3: 00000003ef7d9000 CR4: 00000000000006e0 Sep 11 20:10:01 jupjep Process server_linux (pid: 8581[#260], threadinfo ffff8103ef0a4000, task ffff810409171040) Sep 11 20:10:01 jupjep Stack: ffff81041ca4e9c8 ffffffff81086c39 0000000000000000 ffff8103ef0a5b38 Sep 11 20:10:01 jupjep 0000000000000002 ffffffff810626dd 0000000000000000 0000000000000000 Sep 11 20:10:01 jupjep 0000000000200246 ffff8103f3199280 000000000000000a ffff810409171040 Sep 11 20:10:01 jupjep Call Trace: Sep 11 20:10:01 jupjep [<ffffffff81086c39>] __mmdrop+0xb0/0xc3 Sep 11 20:10:01 jupjep [<ffffffff810626dd>] thread_return+0x68/0x118 Sep 11 20:10:01 jupjep [<ffffffff8106301e>] schedule_timeout+0x8a/0xad Sep 11 20:10:01 jupjep [<ffffffff8108d5c6>] process_timeout+0x0/0x5 Sep 11 20:10:01 jupjep [<ffffffff8102ed6f>] do_sys_poll+0x278/0x360 Sep 11 20:10:01 jupjep [<ffffffff8101e5f5>] __pollwait+0x0/0xe2 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe Sep 11 20:10:01 jupjep [<ffffffff81067b68>] __switch_to+0x26e/0x27d Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118 Sep 11 20:10:01 jupjep [<ffffffff81034d06>] find_extend_vma+0x16/0x59 Sep 11 20:10:01 jupjep [<ffffffff810a1354>] get_futex_key+0x47/0x10c Sep 11 20:10:01 jupjep [<ffffffff81022504>] __up_read+0x13/0x8a Sep 11 20:10:01 jupjep [<ffffffff810a180c>] futex_wake+0xc6/0xd5 Sep 11 20:10:01 jupjep [<ffffffff8103dd0e>] do_futex+0x268/0xc16 Sep 11 20:10:01 jupjep [<ffffffff8100aec3>] do_page_fault+0x45e/0x7b9 Sep 11 20:10:01 jupjep [<ffffffff81084940>] default_wake_function+0x0/0xe Sep 11 20:10:01 jupjep [<ffffffff81062675>] thread_return+0x0/0x118 Sep 11 20:10:01 jupjep [<ffffffff810a27ff>] compat_sys_futex+0xfb/0x119 Sep 11 20:10:01 jupjep [<ffffffff8104aa34>] sys_poll+0x54/0x5a Sep 11 20:10:01 jupjep [<ffffffff81060b44>] cstar_do_call+0x1b/0x65 Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP <ffff8103ef0a5a48> Sep 11 20:10:01 jupjep <0>------------[ cut here ]------------ Sep 11 20:10:01 jupjep kernel BUG at kernel/vserver/context.c:193! Sep 11 20:10:01 jupjep invalid opcode: 0000 [4] SMP Sep 11 20:10:01 jupjep CPU 3 Sep 11 20:10:01 jupjep Modules linked in: iptable_nat nf_nat iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl sunrpc nf_conntrack_ipv4 nf_conntrack nfnetlink ohci_hcd ehci_hcd usbcore k8tem Sep 11 20:10:01 jupjep Pid: 9764:#242, comm: sendmail Not tainted 2.6.20-vs2.3.0.11-gentoo #1 Sep 11 20:10:01 jupjep RIP: 0010:[<ffffffff8109a24b>] [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP: 0018:ffff8103b35dbe88 EFLAGS: 00010246 Sep 11 20:10:01 jupjep RAX: ffff81040ac6a001 RBX: ffff81040ac6a000 RCX: 0000000000000000 Sep 11 20:10:01 jupjep RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff81040ac6a000 Sep 11 20:10:01 jupjep RBP: 0000000000000000 R08: ffffffff815a8718 R09: ffffffff00000000 Sep 11 20:10:01 jupjep R10: 0000000000000296 R11: 0000000000000202 R12: ffff8102e3414080 Sep 11 20:10:01 jupjep R13: ffff8101c52115f8 R14: ffff81041183b980 R15: 0000000000002624 Sep 11 20:10:01 jupjep FS: 00002b8330167ae0(0000) GS:ffff8104118a7ac0(0000) knlGS:0000000056502b90 Sep 11 20:10:01 jupjep CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 11 20:10:01 jupjep CR2: 00002b832fc91db0 CR3: 000000036083c000 CR4: 00000000000006e0 Sep 11 20:10:01 jupjep Process sendmail (pid: 9764[#242], threadinfo ffff8103b35da000, task ffff810339081790) Sep 11 20:10:01 jupjep Stack: ffff81040f8af800 ffffffff8132c56c ffff8102e3414080 ffff81040f8af960 Sep 11 20:10:01 jupjep ffff81040f8af800 ffffffff813458ba 0000000000002624 ffffffff81053352 Sep 11 20:10:01 jupjep 0000000000000000 ffff8102e3414080 ffff8102e34140d0 ffffffff810531d5 Sep 11 20:10:01 jupjep Call Trace: Sep 11 20:10:01 jupjep [<ffffffff8132c56c>] sk_free+0xd9/0x133 Sep 11 20:10:01 jupjep [<ffffffff813458ba>] netlink_release+0x255/0x25f Sep 11 20:10:01 jupjep [<ffffffff81053352>] sock_fasync+0x124/0x133 Sep 11 20:10:01 jupjep [<ffffffff810531d5>] sock_release+0x19/0x72 Sep 11 20:10:01 jupjep [<ffffffff8105338d>] sock_close+0x2c/0x30 Sep 11 20:10:01 jupjep [<ffffffff81012992>] __fput+0xa1/0x19a Sep 11 20:10:01 jupjep [<ffffffff8102408d>] filp_close+0x5d/0x65 Sep 11 20:10:01 jupjep [<ffffffff8101da52>] sys_close+0x8c/0xcf Sep 11 20:10:01 jupjep [<ffffffff8105d11e>] system_call+0x7e/0x83 Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Sep 11 20:10:01 jupjep Code: 0f 0b eb fe 83 7f 14 00 74 04 0f 0b eb fe 83 7f 18 00 74 04 Sep 11 20:10:01 jupjep RIP [<ffffffff8109a24b>] free_vx_info+0xf/0x8d Sep 11 20:10:01 jupjep RSP <ffff8103b35dbe88> a little later the server crashed. Now, since i've set sysctl kernel.panic=5 i don't see any of these logs nor crashes, but dozens of reboots jupjep log # last | grep -E '(boot|crash)' reboot system boot Tue Sep 18 11:27 (01:24) 2.6.22-vs2.3.0.17-gentoo reboot system boot Mon Sep 17 22:23 (14:29) 2.6.22-vs2.3.0.17-gentoo reboot system boot Mon Sep 17 19:52 (16:59) 2.6.22-vs2.3.0.17-gentoo reboot system boot Sun Sep 16 18:57 (1+17:55) 2.6.22-vs2.3.0.17-gentoo trapni pts/0 Sat Sep 15 23:58 - crash (18:58) $MY_IP reboot system boot Fri Sep 14 22:06 (3+14:46) 2.6.22-vs2.3.0.17-gentoo trapni pts/4 Fri Sep 14 13:39 - crash (08:27) $MY_IP reboot system boot Thu Sep 13 19:34 (4+17:18) 2.6.20-vs2.3.0.11-gentoo trapni pts/6 Thu Sep 13 12:38 - crash (06:56) $MY_IP trapni pts/0 Thu Sep 13 08:17 - crash (11:16) $MY_IP reboot system boot Thu Sep 13 08:02 (5+04:50) 2.6.20-vs2.3.0.11-gentoo reboot system boot Tue Sep 11 19:32 (6+17:19) 2.6.20-vs2.3.0.11-gentoo reboot system boot Mon Sep 10 17:52 (7+18:59) 2.6.20-vs2.3.0.11-gentoo reboot system boot Thu Sep 6 16:50 (11+20:02) 2.6.20-vs2.3.0.11-gentoo reboot system boot Fri Aug 3 08:36 (46+04:16) 2.6.20-vs2.3.0.11-gentoo reboot system boot Mon Jul 30 21:31 (3+11:01) 2.6.20-vs2.2.0-gentoo trapni pts/13 Mon Jul 30 12:26 - crash (09:05) $MY_IP These system boots were not caused by me, so these were all crashes. Well now, is there a way to trace this bug and/or to work around? In fact, we've about 10+ mashines of the very same hardware running gentoo hardened profile and a hardened-sources kernel. But this one host running normal gentoo with vserver-sources really fails get get me friendly. Can anybody give me a hint regarding these traces I posted above? Many thanks in advance, Christian Parpart. -- [EMAIL PROTECTED] mailing list
