hi Ben Thanks very much for your reply. > I looked for information on this hardware, and the first thing I found > was that you previously reported several crashes to Debian on this same > hardware: > https://bugs.debian.org/834487 > https://bugs.debian.org/838658 > https://bugs.debian.org/847839
we don't see any hardware error message or ECC error. (we use mcelog and the servers have BMC) we found many machines happened this panic . if it is hardware problem, I think maybe panic in different stacks. and I found the same panic in another hardware,too: [308495.512050] PANIC: double fault, error_code: 0x0 [308495.512077] CPU: 4 PID: 161103 Comm: parameter_serve Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 [308495.512079] Hardware name: Inspur SA5248M4/X10DRT-PS, BIOS 2.01 11/21/2016 [308495.512080] task: ffff883b5dfb0a20 ti: ffff883f3a9b0000 task.ti: ffff883f3a9b0000 [308495.512082] RIP: 0010:[<ffffffff81518598>] [<ffffffff81518598>] sysret_check+0x1/0x4e [308495.512088] RSP: 0018:ffffffffffffffd8 EFLAGS: 00010217 [308495.512090] RAX: 0000000000000000 RBX: 00000000816900f0 RCX: 0000000000000000 [308495.512091] RDX: ffff883f3a9b3fd8 RSI: ffff883f3a9b3d88 RDI: ffff883f3c885040 [308495.512092] RBP: 0000000000000000 R08: ffff883f3a9b0000 R09: 000000000000b629 [308495.512092] R10: 000000010498cc7d R11: 0000000000000000 R12: 00000000c9ffe0b8 [308495.512093] R13: 00000000c9ffe000 R14: 0000000000000001 R15: 00000000ffffffff [308495.512095] FS: 00007fb567de3700(0000) GS:ffff88407f900000(0000) knlGS:0000000000000000 [308495.512096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [308495.512097] CR2: ffffffffffffffc8 CR3: 0000003f44e30000 CR4: 00000000003407e0 [308495.512098] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [308495.512099] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [308495.512100] Stack: [308495.512119] BUG: unable to handle kernel paging request at ffffffffffffffd8 [308495.512149] IP: [<ffffffff81016518>] show_stack_log_lvl+0x108/0x170 [308495.512181] PGD 1816067 PUD 1818067 PMD 0 [308495.512208] Oops: 0000 [#1] SMP [308495.512224] Modules linked in: 8021q garp stp mrp llc tcp_westwood x86_pkg_temp_thermal coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support crc32_pclmul aesni_intel ast aes_x86_64 evdev lrw joydev gf128mul ttm glue_helper drm_kms_helper ablk_helper cryptd drm i2c_algo_bit pcspkr i2c_i801 lpc_ich mei_me i2c_core shpchp mei mfd_core wmi tpm_tis tpm ipmi_watchdog processor thermal_sys acpi_power_meter acpi_pad button ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci ehci_pci libata xhci_hcd ehci_hcd ixgbe dca ptp usbcore scsi_mod pps_core usb_common mdio [308495.512564] CPU: 4 PID: 161103 Comm: parameter_serve Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2 [308495.512600] Hardware name: Inspur SA5248M4/X10DRT-PS, BIOS 2.01 11/21/2016 [308495.512627] task: ffff883b5dfb0a20 ti: ffff883f3a9b0000 task.ti: ffff883f3a9b0000 [308495.512655] RIP: 0010:[<ffffffff81016518>] [<ffffffff81016518>] show_stack_log_lvl+0x108/0x170 [308495.512690] RSP: 0018:ffff88407f904e98 EFLAGS: 00010046 [308495.512711] RAX: ffffffffffffffe0 RBX: ffffffffffffffd8 RCX: ffff88407f8fffc0 [308495.512738] RDX: 0000000000000000 RSI: ffff88407f904f58 RDI: 0000000000000000 [308495.512765] RBP: ffff88407f903fc0 R08: ffffffff81706753 R09: 00000000000005b4 [308495.512792] R10: 0000000000000000 R11: ffff88407f904c2e R12: ffff88407f904f58 [308495.512819] R13: 0000000000000000 R14: ffffffff81706753 R15: 0000000000000000 [308495.512846] FS: 00007fb567de3700(0000) GS:ffff88407f900000(0000) knlGS:0000000000000000 [308495.512876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [308495.512898] CR2: ffffffffffffffd8 CR3: 0000003f44e30000 CR4: 00000000003407e0 [308495.512925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [308495.512952] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [308495.512979] Stack: [308495.512989] ffffffff00000008 ffff88407f904ef0 ffff88407f904eb0 ffffffffffffffd8 [308495.513021] ffff88407f904f58 ffffffffffffffd8 ffff883b5dfb0a20 0000000000000040 [308495.513052] 0000000000000001 00000000ffffffff ffffffff810165fe ffff88407f904f58 [308495.513084] Call Trace: [308495.513096] <#DF> [308495.513106] [308495.513117] [<ffffffff810165fe>] ? show_regs+0x7e/0x1f0 [308495.513136] [<ffffffff810503af>] ? df_debug+0x1f/0x30 [308495.513158] [<ffffffff81014ee8>] ? do_double_fault+0x78/0xf0 [308495.513181] [<ffffffff8151a028>] ? double_fault+0x28/0x30 [308495.513204] [<ffffffff81518598>] ? sysret_check+0x1/0x4e [308495.513225] <<EOE>> [308495.513236] <UNK> [308495.513246] Code: 67 70 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 5b 93 4f 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 4b 67 70 81 89 54 24 14 48 89 4c 24 08 48 89 [308495.513381] RIP [<ffffffff81016518>] show_stack_log_lvl+0x108/0x170 [308495.513408] RSP <ffff88407f904e98> [308495.513423] CR2: ffffffffffffffd8 > Are you using KVM? we don't use KVM, the application is just computing and transferring data. BRs Yongsu ---------- Forwarded message ---------- From: Ben Hutchings <b...@decadent.org.uk> Date: 2017-01-18 1:18 GMT+08:00 Subject: Re: Bug#851641: linux-image-3.16.0-4-amd64 panic:double fault To: 张永肃 <zhangyon...@bytedance.com>, 851...@bugs.debian.org Control: tag -1 moreinfo On Tue, 2017-01-17 at 15:25 +0800, 张永肃 wrote: > Package:linux-image-3.16.0-4-amd64 > Version:3.16.36-1+deb8u1 > > 3.16.36-1+deb8u1 (debian stable package) kernel panic,double fault. > > [952650.981869] PANIC: double fault, error_code: 0x0 > [952650.981909] CPU: 4 PID: 14945 Comm: parameter_serve Not tainted > 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u1 > [952650.981911] Hardware name: Powerleader PR2760TG/X10DRT-PT, BIOS > 2.0 12/18/2015 I looked for information on this hardware, and the first thing I found was that you previously reported several crashes to Debian on this same hardware: https://bugs.debian.org/834487 https://bugs.debian.org/838658 https://bugs.debian.org/847839 Are you sure the hardware is stable? Does it have ECC RAM? (I know the processor supports ECC.) [...] > it similar to this issue which happened on 4.4.0,but the patch do not > work on 3.16 : > http://linux-kernel.2935.n7.nabble.com/PANIC-double-fault-error-code-0x0-in-4-0-0-rc3-2-kvm-related-td1064080.html Are you using KVM? Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus
signature.asc
Description: PGP signature