Public bug reported: Jul 20 14:40:23 anonster kernel: [ 1716.692818] mlx5_core 0000:03:00.0: assert_var[0] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.698541] mlx5_core 0000:03:00.0: assert_var[1] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.704240] mlx5_core 0000:03:00.0: assert_var[2] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.709945] mlx5_core 0000:03:00.0: assert_var[3] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.715641] mlx5_core 0000:03:00.0: assert_var[4] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.721343] mlx5_core 0000:03:00.0: assert_exit_ptr 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.727214] mlx5_core 0000:03:00.0: assert_callra 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.732917] mlx5_core 0000:03:00.0: fw_ver 65535.65535.65535 Jul 20 14:40:23 anonster kernel: [ 1716.738617] mlx5_core 0000:03:00.0: hw_id 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.743620] mlx5_core 0000:03:00.0: irisc_index 255 Jul 20 14:40:23 anonster kernel: [ 1716.748530] mlx5_core 0000:03:00.0: synd 0xff: unrecognized error Jul 20 14:40:23 anonster kernel: [ 1716.754662] mlx5_core 0000:03:00.0: ext_synd 0xffff Jul 20 14:40:23 anonster kernel: [ 1716.759578] mlx5_core 0000:03:00.0: raw fw_ver 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.765038] WARNING: CPU: 0 PID: 0 at /build/linux-hwe-EPHQQp/linux-hwe-4.15.0/kernel/time/timer.c:898 mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.765039] Modules linked in: binfmt_misc lkp_Ubuntu_4_15_0_142_146_generic_78(OEK) bonding nls_iso8859_1 xfs edac_mce_amd ipmi_ssif kvm_amd hpilo kvm i 2c_piix4 irqbypass ipmi_si Jul 20 14:40:23 anonster kernel: [ 1716.765051] mlx5_core 0000:03:00.0: health_care:194:(pid 29045): handling bad device here Jul 20 14:40:23 anonster kernel: [ 1716.765052] ipmi_devintf ipmi_msghandler shpchp acpi_power_meter Jul 20 14:40:23 anonster kernel: [ 1716.765057] mlx5_core 0000:03:00.0: mlx5_handle_bad_state:152:(pid 29045): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:23 anonster kernel: [ 1716.765058] k10temp mac_hid ib_iser Jul 20 14:40:23 anonster kernel: [ 1716.765062] mlx5_core 0000:03:00.0: mlx5_pci_err_detected was called Jul 20 14:40:23 anonster kernel: [ 1716.765063] rdma_cm iw_cm ib_cm Jul 20 14:40:23 anonster kernel: [ 1716.765067] mlx5_core 0000:03:00.0: mlx5_enter_error_state:121:(pid 29045): start Jul 20 14:40:23 anonster kernel: [ 1716.765067] ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async _pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear bcache ses enclosure crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel pcbc ttm drm_kms_helper aesni_intel mlx5_core syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper mlxfw dca nvme cryptd drm devlink i2c_algo_bit smartpqi nvme_core ptp scsi_transport_sas pps_ core wmi Jul 20 14:40:23 anonster kernel: [ 1716.772598] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE K 4.15.0-142-generic #146~16.04.1-Ubuntu Jul 20 14:40:23 anonster kernel: [ 1716.772598] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 05/11/2020 Jul 20 14:40:23 anonster kernel: [ 1716.772600] RIP: 0010:mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.772601] RSP: 0018:ffff91e55e603e30 EFLAGS: 00010093 Jul 20 14:40:23 anonster kernel: [ 1716.772603] RAX: 0000000100056792 RBX: 00000001000567c4 RCX: 000000010005678a Jul 20 14:40:23 anonster kernel: [ 1716.772603] RDX: 000000010005678c RSI: ffff91e55e603e48 RDI: ffff91e55e61a700 Jul 20 14:40:23 anonster kernel: [ 1716.772604] RBP: ffff91e55e603e80 R08: ffff91e55e010800 R09: ffff91e55dc01ff0 Jul 20 14:40:23 anonster kernel: [ 1716.772605] R10: 0000000000000000 R11: 0000000000000040 R12: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772606] R13: ffff91e54bb4d8d8 R14: ffff91e55e61a700 R15: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772607] FS: 0000000000000000(0000) GS:ffff91e55e600000(0000) knlGS:0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.772607] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 20 14:40:23 anonster kernel: [ 1716.772608] CR2: 00007fd20bd2e000 CR3: 0000000816294000 CR4: 0000000000340ef0 Jul 20 14:40:23 anonster kernel: [ 1716.772609] Call Trace: Jul 20 14:40:23 anonster kernel: [ 1716.772611] <IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.772617] ? fbcon_add_cursor_timer+0xc0/0xc0 Jul 20 14:40:23 anonster kernel: [ 1716.772620] cursor_timer_handler+0x45/0x50 Jul 20 14:40:23 anonster kernel: [ 1716.772622] mlx5_core 0000:03:00.0: mlx5_enter_error_state:128:(pid 29045): end Jul 20 14:40:23 anonster kernel: [ 1716.779975] call_timer_fn+0x32/0x140 Jul 20 14:40:23 anonster kernel: [ 1716.779976] run_timer_softirq+0x1e9/0x430 Jul 20 14:40:23 anonster kernel: [ 1716.779978] ? ktime_get+0x3e/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.779981] ? lapic_next_event+0x20/0x30 Jul 20 14:40:23 anonster kernel: [ 1716.779985] __do_softirq+0xf5/0x2a8 Jul 20 14:40:23 anonster kernel: [ 1716.779988] irq_exit+0xca/0xd0 Jul 20 14:40:23 anonster kernel: [ 1716.779989] smp_apic_timer_interrupt+0x79/0x150 Jul 20 14:40:23 anonster kernel: [ 1716.779990] apic_timer_interrupt+0x90/0xa0 Jul 20 14:40:23 anonster kernel: [ 1716.779991] </IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.779994] RIP: 0010:cpuidle_enter_state+0xa7/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.779995] RSP: 0018:ffffffff9c803e08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11 Jul 20 14:40:23 anonster kernel: [ 1716.779996] RAX: ffff91e55e621900 RBX: 0000000000000002 RCX: 000000000000001f Jul 20 14:40:23 anonster kernel: [ 1716.779997] RDX: 0000000000000000 RSI: 0000000028133c6f RDI: 0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.779997] RBP: ffffffff9c803e40 R08: ffffffe48aae298f R09: 0000000000000008 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R10: ffffffff9c803dd8 R11: 0000000000002c8b R12: 0000000000000002 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R13: ffff91e54d043800 R14: ffffffff9c981c98 R15: 0000018fb282ae03 Jul 20 14:40:23 anonster kernel: [ 1716.780000] ? cpuidle_enter_state+0x96/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.780002] cpuidle_enter+0x17/0x20 Jul 20 14:40:23 anonster kernel: [ 1716.780004] call_cpuidle+0x23/0x40 Jul 20 14:40:23 anonster kernel: [ 1716.780006] do_idle+0x197/0x200 Jul 20 14:40:23 anonster kernel: [ 1716.780007] cpu_startup_entry+0x73/0x80 Jul 20 14:40:23 anonster kernel: [ 1716.780010] rest_init+0xaa/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780013] start_kernel+0x4fa/0x51e Jul 20 14:40:23 anonster kernel: [ 1716.780015] x86_64_start_reservations+0x24/0x26 Jul 20 14:40:23 anonster kernel: [ 1716.780016] x86_64_start_kernel+0x74/0x77 Jul 20 14:40:23 anonster kernel: [ 1716.780019] secondary_startup_64+0xa5/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780020] Code: b1 fc ff ff 49 89 46 10 48 89 45 c0 e9 a4 fc ff ff 0f 0b 45 8b 7c 24 20 e9 5d fd ff ff 49 89 55 10 45 8b 7c 24 20 e9 4f fd ff ff <0f> 0b e9 a4 fc ff ff 49 89 46 10 e9 9b fc ff ff e8 97 f9 f7 ff Jul 20 14:40:23 anonster kernel: [ 1716.780035] ---[ end trace 3e92c45954bacae0 ]--- Jul 20 14:40:24 anonster kernel: [ 1717.204835] mlx5_core 0000:03:00.1: assert_var[0] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.210539] mlx5_core 0000:03:00.1: assert_var[1] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.216242] mlx5_core 0000:03:00.1: assert_var[2] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.221940] mlx5_core 0000:03:00.1: assert_var[3] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.227645] mlx5_core 0000:03:00.1: assert_var[4] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.233342] mlx5_core 0000:03:00.1: assert_exit_ptr 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.239218] mlx5_core 0000:03:00.1: assert_callra 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.244917] mlx5_core 0000:03:00.1: fw_ver 65535.65535.65535 Jul 20 14:40:24 anonster kernel: [ 1717.250617] mlx5_core 0000:03:00.1: hw_id 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.255615] mlx5_core 0000:03:00.1: irisc_index 255 Jul 20 14:40:24 anonster kernel: [ 1717.260533] mlx5_core 0000:03:00.1: synd 0xff: unrecognized error Jul 20 14:40:24 anonster kernel: [ 1717.266666] mlx5_core 0000:03:00.1: ext_synd 0xffff Jul 20 14:40:24 anonster kernel: [ 1717.271584] mlx5_core 0000:03:00.1: raw fw_ver 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.277053] mlx5_core 0000:03:00.1: health_care:194:(pid 16512): handling bad device here Jul 20 14:40:24 anonster kernel: [ 1717.277057] mlx5_core 0000:03:00.1: mlx5_handle_bad_state:152:(pid 16512): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:24 anonster kernel: [ 1717.277060] mlx5_core 0000:03:00.1: mlx5_pci_err_detected was called Jul 20 14:40:24 anonster kernel: [ 1717.277063] mlx5_core 0000:03:00.1: mlx5_enter_error_state:121:(pid 16512): start Jul 20 14:40:24 anonster kernel: [ 1717.284625] mlx5_core 0000:03:00.1: mlx5_enter_error_state:128:(pid 16512): end Jul 20 14:40:24 anonster kernel: [ 1717.300353] mlx5_core 0000:03:00.0: mlx5_wait_for_vf_pages:576:(pid 29045): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.321544] mlx5_core 0000:03:00.0 ens2f0: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.330315] mlx5_core 0000:03:00.0 ens2f0: speed changed to 0 for port ens2f0 Jul 20 14:40:24 anonster kernel: [ 1717.337814] mlx5_core 0000:03:00.1 ens2f1: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.346576] mlx5_core 0000:03:00.1 ens2f1: speed changed to 0 for port ens2f1 Jul 20 14:40:24 anonster kernel: [ 1717.354089] mlx5_core 0000:03:00.1: mlx5_wait_for_vf_pages:576:(pid 16512): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.360907] bond0: link status definitely down for interface ens2f0, disabling it Jul 20 14:40:24 anonster kernel: [ 1717.360946] bond0: link status definitely down for interface ens2f1, disabling it Jul 20 14:41:25 anonster kernel: [ 1778.646176] mlx5_core 0000:03:00.0: health recovery flow aborted since the nic state is invalid Jul 20 14:41:25 anonster kernel: [ 1778.646180] mlx5_core 0000:03:00.1: health recovery flow aborted since the nic state is invalid
== ApportVersion ================================= 2.20.1-0ubuntu2.30 == Architecture ================================= amd64 == Date ================================= Tue Jul 20 16:52:44 2021 == Dependencies ================================= adduser 3.113+nmu3ubuntu4 apt 1.2.35 apt-utils 1.2.35 busybox-initramfs 1:1.22.0-15ubuntu1.4 coreutils 8.25-2ubuntu3~16.04 cpio 2.11+dfsg-5ubuntu1.1 debconf 1.5.58ubuntu2 debconf-i18n 1.5.58ubuntu2 debianutils 4.7 dpkg 1.18.4ubuntu1.7+ppa1 [origin: LP-PPA-canonical-is-sa-launchpad] e2fslibs 1.42.13-1ubuntu1.2 e2fsprogs 1.42.13-1ubuntu1.2 gcc-5-base 5.4.0-6ubuntu1~16.04.12 gcc-6-base 6.0.1-0ubuntu1 gnupg 1.4.20-1ubuntu3.3 gpgv 1.4.20-1ubuntu3.3 init-system-helpers 1.29ubuntu4 initramfs-tools 0.122ubuntu8.17 initramfs-tools-bin 0.122ubuntu8.17 initramfs-tools-core 0.122ubuntu8.17 initscripts 2.88dsf-59.3ubuntu2 insserv 1.14.0-5ubuntu3 klibc-utils 2.0.4-8ubuntu1.16.04.4 kmod 22-1ubuntu5.2 libacl1 2.2.52-3 libapt-inst2.0 1.2.35 libapt-pkg5.0 1.2.35 libattr1 1:2.4.47-2 libaudit-common 1:2.4.5-1ubuntu2.1 libaudit1 1:2.4.5-1ubuntu2.1 libblkid1 2.27.1-6ubuntu3.10 libbz2-1.0 1.0.6-8ubuntu0.2 libc6 2.23-0ubuntu11.3 libcomerr2 1.42.13-1ubuntu1.2 libdb5.3 5.3.28-11ubuntu0.2 libfdisk1 2.27.1-6ubuntu3.10 libgcc1 1:6.0.1-0ubuntu1 libgcrypt20 1.6.5-2ubuntu0.6 libgpg-error0 1.21-2ubuntu1 libgpm2 1.20.4-6.1 libklibc 2.0.4-8ubuntu1.16.04.4 libkmod2 22-1ubuntu5.2 liblocale-gettext-perl 1.07-1build1 liblz4-1 0.0~r131-2ubuntu2 liblzma5 5.1.1alpha+20120614-2ubuntu2 libmount1 2.27.1-6ubuntu3.10 libncurses5 6.0+20160213-1ubuntu1 libncursesw5 6.0+20160213-1ubuntu1 libpam-modules 1.1.8-3.2ubuntu2.3 libpam-modules-bin 1.1.8-3.2ubuntu2.3 libpam0g 1.1.8-3.2ubuntu2.3 libpcre3 2:8.38-3.1 libprocps4 2:3.3.10-4ubuntu2.5 libreadline6 6.3-8ubuntu2 libselinux1 2.4-3build2 libsemanage-common 2.3-1build3 libsemanage1 2.3-1build3 libsepol1 2.4-2 libsmartcols1 2.27.1-6ubuntu3.10 libss2 1.42.13-1ubuntu1.2 libstdc++6 5.4.0-6ubuntu1~16.04.12 libsystemd0 229-4ubuntu21.31 libtext-charwidth-perl 0.04-7build5 libtext-iconv-perl 1.7-5build4 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.0+20160213-1ubuntu1 libudev1 229-4ubuntu21.31 libusb-0.1-4 2:0.1.12-28 libustr-1.0-1 1.0.4-5 libuuid1 2.27.1-6ubuntu3.10 libzstd1 1.3.1+dfsg-1~ubuntu0.16.04.1 linux-base 4.5ubuntu1.2~16.04.1 linux-modules-4.15.0-142-generic 4.15.0-142.146~16.04.1 lsb-base 9.20160110ubuntu0.2 mount 2.27.1-6ubuntu3.10 multiarch-support 2.23-0ubuntu11.3 passwd 1:4.2-3.1ubuntu5.4 perl-base 5.22.1-9ubuntu0.9 procps 2:3.3.10-4ubuntu2.5 psmisc 22.21-2.1ubuntu0.1 readline-common 6.3-8ubuntu2 sensible-utils 0.0.9ubuntu0.16.04.1 sysv-rc 2.88dsf-59.3ubuntu2 sysvinit-utils 2.88dsf-59.3ubuntu2 tar 1.28-2.1ubuntu0.2 ubuntu-keyring 2012.05.19.1 udev 229-4ubuntu21.31 util-linux 2.27.1-6ubuntu3.10 uuid-runtime 2.27.1-6ubuntu3.10 zlib1g 1:1.2.8.dfsg-2ubuntu4.3 == DistroRelease ================================= Ubuntu 16.04 == NonfreeKernelModules ================================= lkp_Ubuntu_4_15_0_142_146_generic_78 == Package ================================= linux-image-4.15.0-142-generic 4.15.0-142.146~16.04.1 == PackageArchitecture ================================= amd64 == ProblemType ================================= Bug == ProcCpuinfoMinimal ================================= processor : 15 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7262 8-Core Processor stepping : 0 microcode : 0x8301038 cpu MHz : 1795.684 cache size : 512 KB physical id : 0 siblings : 16 core id : 28 cpu cores : 8 apicid : 57 initial apicid : 57 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 6387.44 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] == ProcEnviron ================================= TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash == ProcVersionSignature ================================= Ubuntu 4.15.0-142.146~16.04.1-generic 4.15.18 == SourcePackage ================================= linux-signed-hwe == Tags ================================= xenial third-party-packages == Uname ================================= Linux 4.15.0-142-generic x86_64 == UpgradeStatus ================================= No upgrade log present (probably fresh install) ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1936958 Title: mlx5_core crash, taking down a bond Status in linux package in Ubuntu: New Bug description: Jul 20 14:40:23 anonster kernel: [ 1716.692818] mlx5_core 0000:03:00.0: assert_var[0] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.698541] mlx5_core 0000:03:00.0: assert_var[1] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.704240] mlx5_core 0000:03:00.0: assert_var[2] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.709945] mlx5_core 0000:03:00.0: assert_var[3] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.715641] mlx5_core 0000:03:00.0: assert_var[4] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.721343] mlx5_core 0000:03:00.0: assert_exit_ptr 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.727214] mlx5_core 0000:03:00.0: assert_callra 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.732917] mlx5_core 0000:03:00.0: fw_ver 65535.65535.65535 Jul 20 14:40:23 anonster kernel: [ 1716.738617] mlx5_core 0000:03:00.0: hw_id 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.743620] mlx5_core 0000:03:00.0: irisc_index 255 Jul 20 14:40:23 anonster kernel: [ 1716.748530] mlx5_core 0000:03:00.0: synd 0xff: unrecognized error Jul 20 14:40:23 anonster kernel: [ 1716.754662] mlx5_core 0000:03:00.0: ext_synd 0xffff Jul 20 14:40:23 anonster kernel: [ 1716.759578] mlx5_core 0000:03:00.0: raw fw_ver 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.765038] WARNING: CPU: 0 PID: 0 at /build/linux-hwe-EPHQQp/linux-hwe-4.15.0/kernel/time/timer.c:898 mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.765039] Modules linked in: binfmt_misc lkp_Ubuntu_4_15_0_142_146_generic_78(OEK) bonding nls_iso8859_1 xfs edac_mce_amd ipmi_ssif kvm_amd hpilo kvm i 2c_piix4 irqbypass ipmi_si Jul 20 14:40:23 anonster kernel: [ 1716.765051] mlx5_core 0000:03:00.0: health_care:194:(pid 29045): handling bad device here Jul 20 14:40:23 anonster kernel: [ 1716.765052] ipmi_devintf ipmi_msghandler shpchp acpi_power_meter Jul 20 14:40:23 anonster kernel: [ 1716.765057] mlx5_core 0000:03:00.0: mlx5_handle_bad_state:152:(pid 29045): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:23 anonster kernel: [ 1716.765058] k10temp mac_hid ib_iser Jul 20 14:40:23 anonster kernel: [ 1716.765062] mlx5_core 0000:03:00.0: mlx5_pci_err_detected was called Jul 20 14:40:23 anonster kernel: [ 1716.765063] rdma_cm iw_cm ib_cm Jul 20 14:40:23 anonster kernel: [ 1716.765067] mlx5_core 0000:03:00.0: mlx5_enter_error_state:121:(pid 29045): start Jul 20 14:40:23 anonster kernel: [ 1716.765067] ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async _pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear bcache ses enclosure crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel pcbc ttm drm_kms_helper aesni_intel mlx5_core syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper mlxfw dca nvme cryptd drm devlink i2c_algo_bit smartpqi nvme_core ptp scsi_transport_sas pps_ core wmi Jul 20 14:40:23 anonster kernel: [ 1716.772598] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE K 4.15.0-142-generic #146~16.04.1-Ubuntu Jul 20 14:40:23 anonster kernel: [ 1716.772598] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 05/11/2020 Jul 20 14:40:23 anonster kernel: [ 1716.772600] RIP: 0010:mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.772601] RSP: 0018:ffff91e55e603e30 EFLAGS: 00010093 Jul 20 14:40:23 anonster kernel: [ 1716.772603] RAX: 0000000100056792 RBX: 00000001000567c4 RCX: 000000010005678a Jul 20 14:40:23 anonster kernel: [ 1716.772603] RDX: 000000010005678c RSI: ffff91e55e603e48 RDI: ffff91e55e61a700 Jul 20 14:40:23 anonster kernel: [ 1716.772604] RBP: ffff91e55e603e80 R08: ffff91e55e010800 R09: ffff91e55dc01ff0 Jul 20 14:40:23 anonster kernel: [ 1716.772605] R10: 0000000000000000 R11: 0000000000000040 R12: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772606] R13: ffff91e54bb4d8d8 R14: ffff91e55e61a700 R15: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772607] FS: 0000000000000000(0000) GS:ffff91e55e600000(0000) knlGS:0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.772607] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 20 14:40:23 anonster kernel: [ 1716.772608] CR2: 00007fd20bd2e000 CR3: 0000000816294000 CR4: 0000000000340ef0 Jul 20 14:40:23 anonster kernel: [ 1716.772609] Call Trace: Jul 20 14:40:23 anonster kernel: [ 1716.772611] <IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.772617] ? fbcon_add_cursor_timer+0xc0/0xc0 Jul 20 14:40:23 anonster kernel: [ 1716.772620] cursor_timer_handler+0x45/0x50 Jul 20 14:40:23 anonster kernel: [ 1716.772622] mlx5_core 0000:03:00.0: mlx5_enter_error_state:128:(pid 29045): end Jul 20 14:40:23 anonster kernel: [ 1716.779975] call_timer_fn+0x32/0x140 Jul 20 14:40:23 anonster kernel: [ 1716.779976] run_timer_softirq+0x1e9/0x430 Jul 20 14:40:23 anonster kernel: [ 1716.779978] ? ktime_get+0x3e/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.779981] ? lapic_next_event+0x20/0x30 Jul 20 14:40:23 anonster kernel: [ 1716.779985] __do_softirq+0xf5/0x2a8 Jul 20 14:40:23 anonster kernel: [ 1716.779988] irq_exit+0xca/0xd0 Jul 20 14:40:23 anonster kernel: [ 1716.779989] smp_apic_timer_interrupt+0x79/0x150 Jul 20 14:40:23 anonster kernel: [ 1716.779990] apic_timer_interrupt+0x90/0xa0 Jul 20 14:40:23 anonster kernel: [ 1716.779991] </IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.779994] RIP: 0010:cpuidle_enter_state+0xa7/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.779995] RSP: 0018:ffffffff9c803e08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11 Jul 20 14:40:23 anonster kernel: [ 1716.779996] RAX: ffff91e55e621900 RBX: 0000000000000002 RCX: 000000000000001f Jul 20 14:40:23 anonster kernel: [ 1716.779997] RDX: 0000000000000000 RSI: 0000000028133c6f RDI: 0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.779997] RBP: ffffffff9c803e40 R08: ffffffe48aae298f R09: 0000000000000008 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R10: ffffffff9c803dd8 R11: 0000000000002c8b R12: 0000000000000002 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R13: ffff91e54d043800 R14: ffffffff9c981c98 R15: 0000018fb282ae03 Jul 20 14:40:23 anonster kernel: [ 1716.780000] ? cpuidle_enter_state+0x96/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.780002] cpuidle_enter+0x17/0x20 Jul 20 14:40:23 anonster kernel: [ 1716.780004] call_cpuidle+0x23/0x40 Jul 20 14:40:23 anonster kernel: [ 1716.780006] do_idle+0x197/0x200 Jul 20 14:40:23 anonster kernel: [ 1716.780007] cpu_startup_entry+0x73/0x80 Jul 20 14:40:23 anonster kernel: [ 1716.780010] rest_init+0xaa/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780013] start_kernel+0x4fa/0x51e Jul 20 14:40:23 anonster kernel: [ 1716.780015] x86_64_start_reservations+0x24/0x26 Jul 20 14:40:23 anonster kernel: [ 1716.780016] x86_64_start_kernel+0x74/0x77 Jul 20 14:40:23 anonster kernel: [ 1716.780019] secondary_startup_64+0xa5/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780020] Code: b1 fc ff ff 49 89 46 10 48 89 45 c0 e9 a4 fc ff ff 0f 0b 45 8b 7c 24 20 e9 5d fd ff ff 49 89 55 10 45 8b 7c 24 20 e9 4f fd ff ff <0f> 0b e9 a4 fc ff ff 49 89 46 10 e9 9b fc ff ff e8 97 f9 f7 ff Jul 20 14:40:23 anonster kernel: [ 1716.780035] ---[ end trace 3e92c45954bacae0 ]--- Jul 20 14:40:24 anonster kernel: [ 1717.204835] mlx5_core 0000:03:00.1: assert_var[0] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.210539] mlx5_core 0000:03:00.1: assert_var[1] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.216242] mlx5_core 0000:03:00.1: assert_var[2] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.221940] mlx5_core 0000:03:00.1: assert_var[3] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.227645] mlx5_core 0000:03:00.1: assert_var[4] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.233342] mlx5_core 0000:03:00.1: assert_exit_ptr 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.239218] mlx5_core 0000:03:00.1: assert_callra 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.244917] mlx5_core 0000:03:00.1: fw_ver 65535.65535.65535 Jul 20 14:40:24 anonster kernel: [ 1717.250617] mlx5_core 0000:03:00.1: hw_id 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.255615] mlx5_core 0000:03:00.1: irisc_index 255 Jul 20 14:40:24 anonster kernel: [ 1717.260533] mlx5_core 0000:03:00.1: synd 0xff: unrecognized error Jul 20 14:40:24 anonster kernel: [ 1717.266666] mlx5_core 0000:03:00.1: ext_synd 0xffff Jul 20 14:40:24 anonster kernel: [ 1717.271584] mlx5_core 0000:03:00.1: raw fw_ver 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.277053] mlx5_core 0000:03:00.1: health_care:194:(pid 16512): handling bad device here Jul 20 14:40:24 anonster kernel: [ 1717.277057] mlx5_core 0000:03:00.1: mlx5_handle_bad_state:152:(pid 16512): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:24 anonster kernel: [ 1717.277060] mlx5_core 0000:03:00.1: mlx5_pci_err_detected was called Jul 20 14:40:24 anonster kernel: [ 1717.277063] mlx5_core 0000:03:00.1: mlx5_enter_error_state:121:(pid 16512): start Jul 20 14:40:24 anonster kernel: [ 1717.284625] mlx5_core 0000:03:00.1: mlx5_enter_error_state:128:(pid 16512): end Jul 20 14:40:24 anonster kernel: [ 1717.300353] mlx5_core 0000:03:00.0: mlx5_wait_for_vf_pages:576:(pid 29045): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.321544] mlx5_core 0000:03:00.0 ens2f0: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.330315] mlx5_core 0000:03:00.0 ens2f0: speed changed to 0 for port ens2f0 Jul 20 14:40:24 anonster kernel: [ 1717.337814] mlx5_core 0000:03:00.1 ens2f1: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.346576] mlx5_core 0000:03:00.1 ens2f1: speed changed to 0 for port ens2f1 Jul 20 14:40:24 anonster kernel: [ 1717.354089] mlx5_core 0000:03:00.1: mlx5_wait_for_vf_pages:576:(pid 16512): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.360907] bond0: link status definitely down for interface ens2f0, disabling it Jul 20 14:40:24 anonster kernel: [ 1717.360946] bond0: link status definitely down for interface ens2f1, disabling it Jul 20 14:41:25 anonster kernel: [ 1778.646176] mlx5_core 0000:03:00.0: health recovery flow aborted since the nic state is invalid Jul 20 14:41:25 anonster kernel: [ 1778.646180] mlx5_core 0000:03:00.1: health recovery flow aborted since the nic state is invalid == ApportVersion ================================= 2.20.1-0ubuntu2.30 == Architecture ================================= amd64 == Date ================================= Tue Jul 20 16:52:44 2021 == Dependencies ================================= adduser 3.113+nmu3ubuntu4 apt 1.2.35 apt-utils 1.2.35 busybox-initramfs 1:1.22.0-15ubuntu1.4 coreutils 8.25-2ubuntu3~16.04 cpio 2.11+dfsg-5ubuntu1.1 debconf 1.5.58ubuntu2 debconf-i18n 1.5.58ubuntu2 debianutils 4.7 dpkg 1.18.4ubuntu1.7+ppa1 [origin: LP-PPA-canonical-is-sa-launchpad] e2fslibs 1.42.13-1ubuntu1.2 e2fsprogs 1.42.13-1ubuntu1.2 gcc-5-base 5.4.0-6ubuntu1~16.04.12 gcc-6-base 6.0.1-0ubuntu1 gnupg 1.4.20-1ubuntu3.3 gpgv 1.4.20-1ubuntu3.3 init-system-helpers 1.29ubuntu4 initramfs-tools 0.122ubuntu8.17 initramfs-tools-bin 0.122ubuntu8.17 initramfs-tools-core 0.122ubuntu8.17 initscripts 2.88dsf-59.3ubuntu2 insserv 1.14.0-5ubuntu3 klibc-utils 2.0.4-8ubuntu1.16.04.4 kmod 22-1ubuntu5.2 libacl1 2.2.52-3 libapt-inst2.0 1.2.35 libapt-pkg5.0 1.2.35 libattr1 1:2.4.47-2 libaudit-common 1:2.4.5-1ubuntu2.1 libaudit1 1:2.4.5-1ubuntu2.1 libblkid1 2.27.1-6ubuntu3.10 libbz2-1.0 1.0.6-8ubuntu0.2 libc6 2.23-0ubuntu11.3 libcomerr2 1.42.13-1ubuntu1.2 libdb5.3 5.3.28-11ubuntu0.2 libfdisk1 2.27.1-6ubuntu3.10 libgcc1 1:6.0.1-0ubuntu1 libgcrypt20 1.6.5-2ubuntu0.6 libgpg-error0 1.21-2ubuntu1 libgpm2 1.20.4-6.1 libklibc 2.0.4-8ubuntu1.16.04.4 libkmod2 22-1ubuntu5.2 liblocale-gettext-perl 1.07-1build1 liblz4-1 0.0~r131-2ubuntu2 liblzma5 5.1.1alpha+20120614-2ubuntu2 libmount1 2.27.1-6ubuntu3.10 libncurses5 6.0+20160213-1ubuntu1 libncursesw5 6.0+20160213-1ubuntu1 libpam-modules 1.1.8-3.2ubuntu2.3 libpam-modules-bin 1.1.8-3.2ubuntu2.3 libpam0g 1.1.8-3.2ubuntu2.3 libpcre3 2:8.38-3.1 libprocps4 2:3.3.10-4ubuntu2.5 libreadline6 6.3-8ubuntu2 libselinux1 2.4-3build2 libsemanage-common 2.3-1build3 libsemanage1 2.3-1build3 libsepol1 2.4-2 libsmartcols1 2.27.1-6ubuntu3.10 libss2 1.42.13-1ubuntu1.2 libstdc++6 5.4.0-6ubuntu1~16.04.12 libsystemd0 229-4ubuntu21.31 libtext-charwidth-perl 0.04-7build5 libtext-iconv-perl 1.7-5build4 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.0+20160213-1ubuntu1 libudev1 229-4ubuntu21.31 libusb-0.1-4 2:0.1.12-28 libustr-1.0-1 1.0.4-5 libuuid1 2.27.1-6ubuntu3.10 libzstd1 1.3.1+dfsg-1~ubuntu0.16.04.1 linux-base 4.5ubuntu1.2~16.04.1 linux-modules-4.15.0-142-generic 4.15.0-142.146~16.04.1 lsb-base 9.20160110ubuntu0.2 mount 2.27.1-6ubuntu3.10 multiarch-support 2.23-0ubuntu11.3 passwd 1:4.2-3.1ubuntu5.4 perl-base 5.22.1-9ubuntu0.9 procps 2:3.3.10-4ubuntu2.5 psmisc 22.21-2.1ubuntu0.1 readline-common 6.3-8ubuntu2 sensible-utils 0.0.9ubuntu0.16.04.1 sysv-rc 2.88dsf-59.3ubuntu2 sysvinit-utils 2.88dsf-59.3ubuntu2 tar 1.28-2.1ubuntu0.2 ubuntu-keyring 2012.05.19.1 udev 229-4ubuntu21.31 util-linux 2.27.1-6ubuntu3.10 uuid-runtime 2.27.1-6ubuntu3.10 zlib1g 1:1.2.8.dfsg-2ubuntu4.3 == DistroRelease ================================= Ubuntu 16.04 == NonfreeKernelModules ================================= lkp_Ubuntu_4_15_0_142_146_generic_78 == Package ================================= linux-image-4.15.0-142-generic 4.15.0-142.146~16.04.1 == PackageArchitecture ================================= amd64 == ProblemType ================================= Bug == ProcCpuinfoMinimal ================================= processor : 15 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7262 8-Core Processor stepping : 0 microcode : 0x8301038 cpu MHz : 1795.684 cache size : 512 KB physical id : 0 siblings : 16 core id : 28 cpu cores : 8 apicid : 57 initial apicid : 57 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 6387.44 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] == ProcEnviron ================================= TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash == ProcVersionSignature ================================= Ubuntu 4.15.0-142.146~16.04.1-generic 4.15.18 == SourcePackage ================================= linux-signed-hwe == Tags ================================= xenial third-party-packages == Uname ================================= Linux 4.15.0-142-generic x86_64 == UpgradeStatus ================================= No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1936958/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp