On Tue, 26 Apr 2011 14:37:24 +1000, Steven wrote in message <1303792644.6192.14.camel@square>:
> Hi folks, > I have a problem that's now beyond my expertise to fault properly. I > get random intermittent kernel errors. Usually when the system is > under stress. > > System specs; > AMD X4 840 (Badged phenomii but it's really an athlon core) > ASUS M4A88TD-M EVO/USB3 > 2x 2GB sticks of Corsair 1600 DDR3 > 500TB WD Caviar Blue. > > Below are some example of the errors. > > square kernel: [ 683.271626] Pid: 6593, comm: rsync Tainted: P D > 2.6.32-5-amd64 #1 > Apr 24 14:51:38 square kernel: [ 683.271631] Call Trace: > Apr 24 14:51:38 square kernel: [ 683.271648] [<ffffffff810cad37>] ? > print_bad_pte+0x232/0x24a > Apr 24 14:51:38 square kernel: [ 683.271660] [<ffffffff810cbde7>] ? > unmap_vmas+0x62d/0x931 > Apr 24 14:51:38 square kernel: [ 683.271672] [<ffffffff8118e194>] ? > cpumask_any_but+0x28/0x34 > Apr 24 14:51:38 square kernel: [ 683.271682] [<ffffffff810d04c4>] ? > exit_mmap+0xc4/0x148 > Apr 24 14:51:38 square kernel: [ 683.271690] [<ffffffff8104bc6d>] ? > mmput+0x3c/0xdf > Apr 24 14:51:38 square kernel: [ 683.271698] [<ffffffff8104f866>] ? > exit_mm+0x102/0x10d > Apr 24 14:51:38 square kernel: [ 683.271705] [<ffffffff8105128b>] ? > do_exit+0x1f8/0x6c6 > Apr 24 14:51:38 square kernel: [ 683.271712] [<ffffffff810517cf>] ? > do_group_exit+0x76/0x9d > Apr 24 14:51:38 square kernel: [ 683.271720] [<ffffffff81051808>] ? > sys_exit_group+0x12/0x16 > Apr 24 14:51:38 square kernel: [ 683.271727] [<ffffffff81010b42>] ? > system_call_fastpath+0x16/0x1b > Apr 24 14:51:44 square kerneloops: Submitted 1 kernel oopses to > www.kerneloops.org > > Another from minecraft; > > d: 6742, comm: java Tainted: P B D 2.6.32-5-amd64 #1 > Apr 24 15:12:02 square kernel: [ 1907.726033] Call Trace: > Apr 24 15:12:02 square kernel: [ 1907.726039] [<ffffffff810b7a11>] ? > bad_page+0x116/0x129 > Apr 24 15:12:02 square kernel: [ 1907.726042] [<ffffffff810b9b2e>] ? > get_page_from_freelist+0x4fd/0x760 > Apr 24 15:12:02 square kernel: [ 1907.726098] [<ffffffffa0246f02>] ? > firegl_trace+0x72/0x1e0 [fglrx] > Apr 24 15:12:02 square kernel: [ 1907.726100] [<ffffffff810ba0f8>] ? > __alloc_pages_nodemask+0x11c/0x5f4 > Apr 24 15:12:02 square kernel: [ 1907.726104] [<ffffffff81036605>] ? > native_flush_tlb_others+0xb6/0xe3 > Apr 24 15:12:02 square kernel: [ 1907.726107] [<ffffffff810bc479>] ? > ____pagevec_lru_add+0x160/0x176 > Apr 24 15:12:02 square kernel: [ 1907.726110] [<ffffffff810cc981>] ? > handle_mm_fault+0x27a/0x80f > Apr 24 15:12:02 square kernel: [ 1907.726113] [<ffffffff812fe6b6>] ? > do_page_fault+0x2e0/0x2fc > Apr 24 15:12:02 square kernel: [ 1907.726116] [<ffffffff812fc555>] ? > page_fault+0x25/0x30 > > Another one from stress. > > stress D 0000000000000000 0 5972 5963 0x00000000 > Apr 25 21:16:11 square kernel: [ 360.740389] ffff88011b04dbd0 > 0000000000000082 ffff880114f40150 000000000000000e > Apr 25 21:16:11 square kernel: [ 360.740392] 0007ffffffffffff > 0000000000000000 000000000000f9e0 ffff880100329fd8 > Apr 25 21:16:11 square kernel: [ 360.740395] 0000000000015780 > 0000000000015780 ffff88011b04f100 ffff88011b04f3f8 > Apr 25 21:16:11 square kernel: [ 360.740397] Call Trace: > Apr 25 21:16:11 square kernel: [ 360.740404] [<ffffffff8104001f>] ? > check_preempt_wakeup+0x1dd/0x268 > Apr 25 21:16:11 square kernel: [ 360.740408] [<ffffffff812fb65b>] ? > __mutex_lock_common+0x122/0x192 > Apr 25 21:16:11 square kernel: [ 360.740411] [<ffffffff810493e0>] ? > update_rq_clock+0xf/0x28 > Apr 25 21:16:11 square kernel: [ 360.740413] [<ffffffff812fb783>] ? > mutex_lock+0x1a/0x31 > Apr 25 21:16:11 square kernel: [ 360.740416] [<ffffffff8110be35>] ? > sync_filesystems+0x13/0xe3 > Apr 25 21:16:11 square kernel: [ 360.740418] [<ffffffff8110bf4a>] ? > sys_sync+0x1c/0x2e > Apr 25 21:16:11 square kernel: [ 360.740420] [<ffffffff81010b42>] ? > system_call_fastpath+0x16/0x1b > Apr 25 21:18:11 square kernel: [ 480.740375] stress D > ffff8800cf609c40 0 5965 5963 0x00000000 > Apr 25 21:18:11 square kernel: [ 480.740378] ffff8800cf609c40 > 0000000000000086 ffffffff810414d5 000000010000000e > Apr 25 21:18:11 square kernel: [ 480.740381] 0000000000015780 > ffff880100383e68 000000000000f9e0 ffff880100383fd8 > Apr 25 21:18:11 square kernel: [ 480.740383] 0000000000015780 > 0000000000015780 ffff8800cf60f100 ffff8800cf60f3f8 > Apr 25 21:18:11 square kernel: [ 480.740385] Call Trace: > Apr 25 21:18:11 square kernel: [ 480.740392] [<ffffffff810414d5>] ? > select_task_rq_fair+0x472/0x836 > Apr 25 21:18:11 square kernel: [ 480.740395] [<ffffffff8101650e>] ? > native_sched_clock+0x2e/0x66 > Apr 25 21:18:11 square kernel: [ 480.740397] [<ffffffff8103fc8e>] ? > update_curr+0xa6/0x147 > Apr 25 21:18:11 square kernel: [ 480.740399] [<ffffffff8101654b>] ? > sched_clock+0x5/0x8 > Apr 25 21:18:11 square kernel: [ 480.740402] [<ffffffff812fb65b>] ? > __mutex_lock_common+0x122/0x192 > Apr 25 21:18:11 square kernel: [ 480.740404] [<ffffffff812fb783>] ? > mutex_lock+0x1a/0x31 > Apr 25 21:18:11 square kernel: [ 480.740407] [<ffffffff8110be35>] ? > sync_filesystems+0x13/0xe3 > Apr 25 21:18:11 square kernel: [ 480.740409] [<ffffffff8110bf40>] ? > sys_sync+0x12/0x2e > Apr 25 21:18:11 square kernel: [ 480.740411] [<ffffffff81010b42>] ? > system_call_fastpath+0x16/0x1b > > My attempts at troubleshooting this have been like so; > > 1) Compile kernels and flightgear. ..flightgear, from git? > Usually fails after 10 mins or so. > 2) Remove one mem stick, swap with other. Try different slots. It > fails "less often" with one stick than with both. > 3) Memtest86+ shows both sticks to be ok. ..memory running too hot? Fan air ducting around your memory sticks might help. > 4) Ran "stress". This fails more often if I enable hdd tests but it > still fails. > 5) Installed fedora to prove it's not just a Debian thing. Errors are > the exact same under fedora. > > I'm at a loss as to what it could be and would like to determine at > least something before I start throwing money around. All I have left > if that some incompatibility between mobo/mem/cpu/disk is causing > this. > > Does anyone have any advice on what tools I can use to narrow it down > more or eliminate certain components? -- ..med vennlig hilsen = with Kind Regards from Arnt Karlsen ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110426141446.1236bf17@celsius.local