Package: linux-2.6 Version: 2.6.32-15~bpo50+1 Severity: important Hi, I'm using the latest lenny-backport kernel on a new dell poweredge R710. every few days I have a kernel panic similar to : Kernel panic - not syncing: CRED: put_cred_rcu() sees f640ad80 with usage -163535872" I automated a vmcore backup with kexec in order to collect useful infos:
# uname -a Linux hostname 2.6.32-bpo.5-686-bigmem #1 SMP Fri Jun 11 22:59:12 UTC 2010 i686 GNU/Linux # dpkg -l linux-image-2.6.32-bpo.5-686-bigmem ii linux-image-2.6.32-bpo.5-686-bigmem 2.6.32-15~bpo50+1 Linux 2.6.32 for PCs with 4GB+ RAM # dpkg -l firmware-bnx2 ii firmware-bnx2 0.24~bpo50+1 Binary firmware for Broadcom NetXtremeII # lsmod Module Size Used by xt_multiport 1775 1 iptable_filter 1790 1 ip_tables 7694 1 iptable_filter x_tables 8327 2 xt_multiport,ip_tables ipmi_devintf 4029 2 ipmi_si 26744 1 ipmi_msghandler 22871 2 ipmi_devintf,ipmi_si ext2 46337 1 loop 9765 0 snd_pcm 47346 0 snd_timer 12238 1 snd_pcm snd 34383 2 snd_pcm,snd_timer evdev 5609 0 soundcore 3450 1 snd psmouse 44665 0 snd_page_alloc 5133 1 snd_pcm dcdbas 3948 0 tpm_tis 5572 0 tpm 8145 1 tpm_tis tpm_bios 3573 1 tpm serio_raw 2920 0 pcspkr 1207 0 power_meter 6894 0 button 3598 0 processor 26623 24 ext3 94308 3 jbd 32213 1 ext3 mbcache 3766 2 ext2,ext3 dm_mirror 9683 0 dm_region_hash 5652 1 dm_mirror dm_log 6425 2 dm_mirror,dm_region_hash dm_snapshot 18033 0 dm_mod 46150 15 dm_mirror,dm_log,dm_snapshot sg 15980 0 sr_mod 10770 0 cdrom 26487 1 sr_mod ata_generic 2019 0 sd_mod 25889 7 crc_t10dif 1012 1 sd_mod ses 4516 0 enclosure 4027 1 ses ata_piix 17672 0 ehci_hcd 28251 0 uhci_hcd 16153 0 libata 115989 2 ata_generic,ata_piix usbcore 98810 3 ehci_hcd,uhci_hcd nls_base 4541 1 usbcore megaraid_sas 21953 6 scsi_mod 101457 6 sg,sr_mod,sd_mod,ses,libata,megaraid_sas bnx2 52121 0 thermal 9198 0 fan 2590 0 thermal_sys 9378 3 processor,thermal,fan # lspci 00:00.0 Host bridge: Intel Corporation QuickPath Architecture I/O Hub to ESI Port (rev 13) 00:01.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 1 (rev 13) 00:03.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 3 (rev 13) 00:04.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 4 (rev 13) 00:05.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 5 (rev 13) 00:06.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 6 (rev 13) 00:07.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 7 (rev 13) 00:09.0 PCI bridge: Intel Corporation QuickPath Architecture I/O Hub PCI Express Root Port 9 (rev 13) 00:14.0 PIC: Intel Corporation QuickPath Architecture I/O Hub System Management Registers (rev 13) 00:14.1 PIC: Intel Corporation QuickPath Architecture I/O Hub GPIO and Scratch Pad Registers (rev 13) 00:14.2 PIC: Intel Corporation QuickPath Architecture I/O Hub Control Status and RAS Registers (rev 13) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA IDE Controller (rev 02) 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 03:00.0 RAID bus controller: LSI Logic / Symbios Logic Device 0079 (rev 04) 08:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 [Hermon] - Winbond/Nuvoton (rev 0a) then I use crash from http://people.redhat.com/~anderson/ # ./crash /usr/lib/debug/boot/vmlinux-2.6.32-bpo.5-686-bigmem /root/vmcore_20100807_084759 crash 5.0.6 [...] KERNEL: /usr/lib/debug/boot/vmlinux-2.6.32-bpo.5-686-bigmem DUMPFILE: /root/vmcore_20100807_084759 CPUS: 24 DATE: Sat Aug 7 08:47:48 2010 UPTIME: 1 days, 19:44:25 LOAD AVERAGE: 0.14, 0.43, 0.54 TASKS: 583 NODENAME: hostname RELEASE: 2.6.32-bpo.5-686-bigmem VERSION: #1 SMP Fri Jun 11 22:59:12 UTC 2010 MACHINE: i686 (2925 Mhz) MEMORY: 24 GB PANIC: "[157461.063951] Kernel panic - not syncing: CRED: put_cred_rcu() sees f640ad80 with usage -163535872" PID: 0 COMMAND: "swapper" TASK: f7151100 (1 of 24) [THREAD_INFO: f715c000] CPU: 15 STATE: TASK_RUNNING (PANIC) crash> log [...] [157461.063951] Kernel panic - not syncing: CRED: put_cred_rcu() sees f640ad80 with usage -163535872 [157461.063953] [157461.092227] Pid: 0, comm: swapper Not tainted 2.6.32-bpo.5-686-bigmem #1 [157461.092228] Call Trace: [157461.092234] [<c127d2b8>] ? panic+0x38/0xe4 [157461.092238] [<c104f409>] ? put_cred_rcu+0x1c/0x7b [157461.092241] [<c1075bc5>] ? __rcu_process_callbacks+0x164/0x227 [157461.092243] [<c1075ca4>] ? rcu_process_callbacks+0x1c/0x39 [157461.092246] [<c103be20>] ? __do_softirq+0xaa/0x151 [157461.092254] [<c103bef8>] ? do_softirq+0x31/0x3c [157461.092260] [<c103bfce>] ? irq_exit+0x26/0x58 [157461.092268] [<c1019b20>] ? smp_apic_timer_interrupt+0x6c/0x76 [157461.092277] [<c10089d5>] ? apic_timer_interrupt+0x31/0x38 [157461.092286] [<c114007b>] ? radix_tree_node_alloc+0x2c/0x53 [157461.092299] [<f89520e7>] ? acpi_idle_enter_bm+0x253/0x28e [processor] [157461.092308] [<c11d6269>] ? cpuidle_idle_call+0x68/0xbb [157461.092316] [<c1007229>] ? cpu_idle+0x89/0xa4 crash> bt PID: 0 TASK: f7151100 CPU: 15 COMMAND: "swapper" #0 [f715dddc] crash_kexec at c1064078 #1 [f715ddf8] machine_kexec at c101c7cd #2 [f715de48] crash_kexec at c106408d #3 [f715de9c] panic at c127d2ba #4 [f715deb4] put_cred_rcu at c104f404 #5 [f715dee8] rcu_process_callbacks at c1075c9f #6 [f715deec] __do_softirq at c103be1a #7 [f715df14] do_softirq at c103bef3 #8 [f715df20] irq_exit at c103bfc9 #9 [f715df24] smp_apic_timer_interrupt at c1019b1b #10 [f715df30] apic_timer_interrupt at c10089d0 #11 [f715df94] cpuidle_idle_call at c11d6265 #12 [f715dfa0] cpu_idle at c1007223 crash> struct cred f640ad80 struct cred { usage = { counter = -163535872 }, uid = 3238771162, gid = 0, suid = 0, sgid = 3240690332, euid = 0, egid = 0, fsuid = 0, fsgid = 13, securebits = 749, cap_inheritable = { cap = {0, 16777216} }, cap_permitted = { cap = {0, 0} }, cap_effective = { cap = {0, 0} }, cap_bset = { cap = {0, 4083503872} }, jit_keyring = 0 '\000', thread_keyring = 0x0, request_key_auth = 0x0, tgcred = 0x20, security = 0x0, user = 0xffffffff, group_info = 0xffffffff, rcu = { next = 0x0, func = 0 } } crash> mod MODULE NAME SIZE OBJECT FILE f804feb4 ipmi_msghandler 22871 (not loaded) [CONFIG_KALLSYMS] f8054e3c thermal_sys 9378 (not loaded) [CONFIG_KALLSYMS] f8059964 x_tables 8327 (not loaded) [CONFIG_KALLSYMS] f805dd00 ipmi_devintf 4029 (not loaded) [CONFIG_KALLSYMS] f8060718 fan 2590 (not loaded) [CONFIG_KALLSYMS] f8068b4c ipmi_si 26744 (not loaded) [CONFIG_KALLSYMS] f806fd30 thermal 9198 (not loaded) [CONFIG_KALLSYMS] f838aa44 ip_tables 7694 (not loaded) [CONFIG_KALLSYMS] f839149c iptable_filter 1790 (not loaded) [CONFIG_KALLSYMS] f83a2a70 bnx2 52121 (not loaded) [CONFIG_KALLSYMS] f83bc2d0 scsi_mod 101457 (not loaded) [CONFIG_KALLSYMS] f83eb46c xt_multiport 1775 (not loaded) [CONFIG_KALLSYMS] f8421afc megaraid_sas 21953 (not loaded) [CONFIG_KALLSYMS] f8431ef0 nls_base 4541 (not loaded) [CONFIG_KALLSYMS] f8476acc usbcore 98810 (not loaded) [CONFIG_KALLSYMS] f84d5720 libata 115989 (not loaded) [CONFIG_KALLSYMS] f84f9860 uhci_hcd 16153 (not loaded) [CONFIG_KALLSYMS] f85134ec ehci_hcd 28251 (not loaded) [CONFIG_KALLSYMS] f852c16c ata_piix 17672 (not loaded) [CONFIG_KALLSYMS] f855db1c enclosure 4027 (not loaded) [CONFIG_KALLSYMS] f859cdcc ses 4516 (not loaded) [CONFIG_KALLSYMS] f85d5268 crc_t10dif 1012 (not loaded) [CONFIG_KALLSYMS] f8622b4c sd_mod 25889 (not loaded) [CONFIG_KALLSYMS] f862e5e8 ata_generic 2019 (not loaded) [CONFIG_KALLSYMS] f8645ba8 cdrom 26487 (not loaded) [CONFIG_KALLSYMS] f8657490 sr_mod 10770 (not loaded) [CONFIG_KALLSYMS] f866b95c sg 15980 (not loaded) [CONFIG_KALLSYMS] f869b598 dm_mod 46150 (not loaded) [CONFIG_KALLSYMS] f86b4c7c dm_snapshot 18033 (not loaded) [CONFIG_KALLSYMS] f86c33e0 dm_log 6425 (not loaded) [CONFIG_KALLSYMS] f86d01a4 dm_region_hash 5652 (not loaded) [CONFIG_KALLSYMS] f86e115c dm_mirror 9683 (not loaded) [CONFIG_KALLSYMS] f88c6b20 mbcache 3766 (not loaded) [CONFIG_KALLSYMS] f88e3d64 jbd 32213 (not loaded) [CONFIG_KALLSYMS] f8928000 ext3 94308 (not loaded) [CONFIG_KALLSYMS] f893aa58 button 3598 (not loaded) [CONFIG_KALLSYMS] f8955b50 processor 26623 (not loaded) [CONFIG_KALLSYMS] f8985648 power_meter 6894 (not loaded) [CONFIG_KALLSYMS] f898f274 pcspkr 1207 (not loaded) [CONFIG_KALLSYMS] f899a82c serio_raw 2920 (not loaded) [CONFIG_KALLSYMS] f89aaa94 tpm_bios 3573 (not loaded) [CONFIG_KALLSYMS] f89b9a38 tpm 8145 (not loaded) [CONFIG_KALLSYMS] f89c81b4 tpm_tis 5572 (not loaded) [CONFIG_KALLSYMS] f89f0acc psmouse 44665 (not loaded) [CONFIG_KALLSYMS] f8a07acc dcdbas 3948 (not loaded) [CONFIG_KALLSYMS] f8a1407c snd_page_alloc 5133 (not loaded) [CONFIG_KALLSYMS] f8a211d4 evdev 5609 (not loaded) [CONFIG_KALLSYMS] f8a34980 soundcore 3450 (not loaded) [CONFIG_KALLSYMS] f8a54690 snd 34383 (not loaded) [CONFIG_KALLSYMS] f8a6c85c snd_timer 12238 (not loaded) [CONFIG_KALLSYMS] f8a93b18 snd_pcm 47346 (not loaded) [CONFIG_KALLSYMS] f8ab8168 loop 9765 (not loaded) [CONFIG_KALLSYMS] f8cee24c ext2 46337 (not loaded) [CONFIG_KALLSYMS] we noticed something : when the server froze after a panic (before I set it up to automatically switch to a crashkernel) , the other servers connected to the same ethernet switch were unreachable over the network. everything seems as if the ethernet card gets crazy and starts sending random data . I can't say for sure ... Restarting the faulty server gets everything back in order. I do hope there is enough data to identify the cause of this bug. I keep the vmcore dump for some time in case someone wants more infos. Regards, Joseph. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org