This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1936958 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete ** Tags added: bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1936958 Title: mlx5_core crash, taking down a bond Status in linux package in Ubuntu: Incomplete Bug description: Jul 20 14:40:23 anonster kernel: [ 1716.692818] mlx5_core 0000:03:00.0: assert_var[0] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.698541] mlx5_core 0000:03:00.0: assert_var[1] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.704240] mlx5_core 0000:03:00.0: assert_var[2] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.709945] mlx5_core 0000:03:00.0: assert_var[3] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.715641] mlx5_core 0000:03:00.0: assert_var[4] 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.721343] mlx5_core 0000:03:00.0: assert_exit_ptr 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.727214] mlx5_core 0000:03:00.0: assert_callra 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.732917] mlx5_core 0000:03:00.0: fw_ver 65535.65535.65535 Jul 20 14:40:23 anonster kernel: [ 1716.738617] mlx5_core 0000:03:00.0: hw_id 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.743620] mlx5_core 0000:03:00.0: irisc_index 255 Jul 20 14:40:23 anonster kernel: [ 1716.748530] mlx5_core 0000:03:00.0: synd 0xff: unrecognized error Jul 20 14:40:23 anonster kernel: [ 1716.754662] mlx5_core 0000:03:00.0: ext_synd 0xffff Jul 20 14:40:23 anonster kernel: [ 1716.759578] mlx5_core 0000:03:00.0: raw fw_ver 0xffffffff Jul 20 14:40:23 anonster kernel: [ 1716.765038] WARNING: CPU: 0 PID: 0 at /build/linux-hwe-EPHQQp/linux-hwe-4.15.0/kernel/time/timer.c:898 mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.765039] Modules linked in: binfmt_misc lkp_Ubuntu_4_15_0_142_146_generic_78(OEK) bonding nls_iso8859_1 xfs edac_mce_amd ipmi_ssif kvm_amd hpilo kvm i 2c_piix4 irqbypass ipmi_si Jul 20 14:40:23 anonster kernel: [ 1716.765051] mlx5_core 0000:03:00.0: health_care:194:(pid 29045): handling bad device here Jul 20 14:40:23 anonster kernel: [ 1716.765052] ipmi_devintf ipmi_msghandler shpchp acpi_power_meter Jul 20 14:40:23 anonster kernel: [ 1716.765057] mlx5_core 0000:03:00.0: mlx5_handle_bad_state:152:(pid 29045): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:23 anonster kernel: [ 1716.765058] k10temp mac_hid ib_iser Jul 20 14:40:23 anonster kernel: [ 1716.765062] mlx5_core 0000:03:00.0: mlx5_pci_err_detected was called Jul 20 14:40:23 anonster kernel: [ 1716.765063] rdma_cm iw_cm ib_cm Jul 20 14:40:23 anonster kernel: [ 1716.765067] mlx5_core 0000:03:00.0: mlx5_enter_error_state:121:(pid 29045): start Jul 20 14:40:23 anonster kernel: [ 1716.765067] ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async _pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear bcache ses enclosure crct10dif_pclmul crc32_pclmul mgag200 ghash_clmulni_intel pcbc ttm drm_kms_helper aesni_intel mlx5_core syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper mlxfw dca nvme cryptd drm devlink i2c_algo_bit smartpqi nvme_core ptp scsi_transport_sas pps_ core wmi Jul 20 14:40:23 anonster kernel: [ 1716.772598] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE K 4.15.0-142-generic #146~16.04.1-Ubuntu Jul 20 14:40:23 anonster kernel: [ 1716.772598] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 05/11/2020 Jul 20 14:40:23 anonster kernel: [ 1716.772600] RIP: 0010:mod_timer+0x3e4/0x400 Jul 20 14:40:23 anonster kernel: [ 1716.772601] RSP: 0018:ffff91e55e603e30 EFLAGS: 00010093 Jul 20 14:40:23 anonster kernel: [ 1716.772603] RAX: 0000000100056792 RBX: 00000001000567c4 RCX: 000000010005678a Jul 20 14:40:23 anonster kernel: [ 1716.772603] RDX: 000000010005678c RSI: ffff91e55e603e48 RDI: ffff91e55e61a700 Jul 20 14:40:23 anonster kernel: [ 1716.772604] RBP: ffff91e55e603e80 R08: ffff91e55e010800 R09: ffff91e55dc01ff0 Jul 20 14:40:23 anonster kernel: [ 1716.772605] R10: 0000000000000000 R11: 0000000000000040 R12: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772606] R13: ffff91e54bb4d8d8 R14: ffff91e55e61a700 R15: ffff91e54bb4d8d8 Jul 20 14:40:23 anonster kernel: [ 1716.772607] FS: 0000000000000000(0000) GS:ffff91e55e600000(0000) knlGS:0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.772607] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 20 14:40:23 anonster kernel: [ 1716.772608] CR2: 00007fd20bd2e000 CR3: 0000000816294000 CR4: 0000000000340ef0 Jul 20 14:40:23 anonster kernel: [ 1716.772609] Call Trace: Jul 20 14:40:23 anonster kernel: [ 1716.772611] <IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.772617] ? fbcon_add_cursor_timer+0xc0/0xc0 Jul 20 14:40:23 anonster kernel: [ 1716.772620] cursor_timer_handler+0x45/0x50 Jul 20 14:40:23 anonster kernel: [ 1716.772622] mlx5_core 0000:03:00.0: mlx5_enter_error_state:128:(pid 29045): end Jul 20 14:40:23 anonster kernel: [ 1716.779975] call_timer_fn+0x32/0x140 Jul 20 14:40:23 anonster kernel: [ 1716.779976] run_timer_softirq+0x1e9/0x430 Jul 20 14:40:23 anonster kernel: [ 1716.779978] ? ktime_get+0x3e/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.779981] ? lapic_next_event+0x20/0x30 Jul 20 14:40:23 anonster kernel: [ 1716.779985] __do_softirq+0xf5/0x2a8 Jul 20 14:40:23 anonster kernel: [ 1716.779988] irq_exit+0xca/0xd0 Jul 20 14:40:23 anonster kernel: [ 1716.779989] smp_apic_timer_interrupt+0x79/0x150 Jul 20 14:40:23 anonster kernel: [ 1716.779990] apic_timer_interrupt+0x90/0xa0 Jul 20 14:40:23 anonster kernel: [ 1716.779991] </IRQ> Jul 20 14:40:23 anonster kernel: [ 1716.779994] RIP: 0010:cpuidle_enter_state+0xa7/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.779995] RSP: 0018:ffffffff9c803e08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11 Jul 20 14:40:23 anonster kernel: [ 1716.779996] RAX: ffff91e55e621900 RBX: 0000000000000002 RCX: 000000000000001f Jul 20 14:40:23 anonster kernel: [ 1716.779997] RDX: 0000000000000000 RSI: 0000000028133c6f RDI: 0000000000000000 Jul 20 14:40:23 anonster kernel: [ 1716.779997] RBP: ffffffff9c803e40 R08: ffffffe48aae298f R09: 0000000000000008 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R10: ffffffff9c803dd8 R11: 0000000000002c8b R12: 0000000000000002 Jul 20 14:40:23 anonster kernel: [ 1716.779998] R13: ffff91e54d043800 R14: ffffffff9c981c98 R15: 0000018fb282ae03 Jul 20 14:40:23 anonster kernel: [ 1716.780000] ? cpuidle_enter_state+0x96/0x300 Jul 20 14:40:23 anonster kernel: [ 1716.780002] cpuidle_enter+0x17/0x20 Jul 20 14:40:23 anonster kernel: [ 1716.780004] call_cpuidle+0x23/0x40 Jul 20 14:40:23 anonster kernel: [ 1716.780006] do_idle+0x197/0x200 Jul 20 14:40:23 anonster kernel: [ 1716.780007] cpu_startup_entry+0x73/0x80 Jul 20 14:40:23 anonster kernel: [ 1716.780010] rest_init+0xaa/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780013] start_kernel+0x4fa/0x51e Jul 20 14:40:23 anonster kernel: [ 1716.780015] x86_64_start_reservations+0x24/0x26 Jul 20 14:40:23 anonster kernel: [ 1716.780016] x86_64_start_kernel+0x74/0x77 Jul 20 14:40:23 anonster kernel: [ 1716.780019] secondary_startup_64+0xa5/0xb0 Jul 20 14:40:23 anonster kernel: [ 1716.780020] Code: b1 fc ff ff 49 89 46 10 48 89 45 c0 e9 a4 fc ff ff 0f 0b 45 8b 7c 24 20 e9 5d fd ff ff 49 89 55 10 45 8b 7c 24 20 e9 4f fd ff ff <0f> 0b e9 a4 fc ff ff 49 89 46 10 e9 9b fc ff ff e8 97 f9 f7 ff Jul 20 14:40:23 anonster kernel: [ 1716.780035] ---[ end trace 3e92c45954bacae0 ]--- Jul 20 14:40:24 anonster kernel: [ 1717.204835] mlx5_core 0000:03:00.1: assert_var[0] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.210539] mlx5_core 0000:03:00.1: assert_var[1] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.216242] mlx5_core 0000:03:00.1: assert_var[2] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.221940] mlx5_core 0000:03:00.1: assert_var[3] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.227645] mlx5_core 0000:03:00.1: assert_var[4] 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.233342] mlx5_core 0000:03:00.1: assert_exit_ptr 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.239218] mlx5_core 0000:03:00.1: assert_callra 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.244917] mlx5_core 0000:03:00.1: fw_ver 65535.65535.65535 Jul 20 14:40:24 anonster kernel: [ 1717.250617] mlx5_core 0000:03:00.1: hw_id 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.255615] mlx5_core 0000:03:00.1: irisc_index 255 Jul 20 14:40:24 anonster kernel: [ 1717.260533] mlx5_core 0000:03:00.1: synd 0xff: unrecognized error Jul 20 14:40:24 anonster kernel: [ 1717.266666] mlx5_core 0000:03:00.1: ext_synd 0xffff Jul 20 14:40:24 anonster kernel: [ 1717.271584] mlx5_core 0000:03:00.1: raw fw_ver 0xffffffff Jul 20 14:40:24 anonster kernel: [ 1717.277053] mlx5_core 0000:03:00.1: health_care:194:(pid 16512): handling bad device here Jul 20 14:40:24 anonster kernel: [ 1717.277057] mlx5_core 0000:03:00.1: mlx5_handle_bad_state:152:(pid 16512): Expected to see disabled NIC but it is has invalid value 3 Jul 20 14:40:24 anonster kernel: [ 1717.277060] mlx5_core 0000:03:00.1: mlx5_pci_err_detected was called Jul 20 14:40:24 anonster kernel: [ 1717.277063] mlx5_core 0000:03:00.1: mlx5_enter_error_state:121:(pid 16512): start Jul 20 14:40:24 anonster kernel: [ 1717.284625] mlx5_core 0000:03:00.1: mlx5_enter_error_state:128:(pid 16512): end Jul 20 14:40:24 anonster kernel: [ 1717.300353] mlx5_core 0000:03:00.0: mlx5_wait_for_vf_pages:576:(pid 29045): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.321544] mlx5_core 0000:03:00.0 ens2f0: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.330315] mlx5_core 0000:03:00.0 ens2f0: speed changed to 0 for port ens2f0 Jul 20 14:40:24 anonster kernel: [ 1717.337814] mlx5_core 0000:03:00.1 ens2f1: mlx5e_get_link_ksettings: query port ptys failed: -5 Jul 20 14:40:24 anonster kernel: [ 1717.346576] mlx5_core 0000:03:00.1 ens2f1: speed changed to 0 for port ens2f1 Jul 20 14:40:24 anonster kernel: [ 1717.354089] mlx5_core 0000:03:00.1: mlx5_wait_for_vf_pages:576:(pid 16512): Skipping wait for vf pages stage Jul 20 14:40:24 anonster kernel: [ 1717.360907] bond0: link status definitely down for interface ens2f0, disabling it Jul 20 14:40:24 anonster kernel: [ 1717.360946] bond0: link status definitely down for interface ens2f1, disabling it Jul 20 14:41:25 anonster kernel: [ 1778.646176] mlx5_core 0000:03:00.0: health recovery flow aborted since the nic state is invalid Jul 20 14:41:25 anonster kernel: [ 1778.646180] mlx5_core 0000:03:00.1: health recovery flow aborted since the nic state is invalid == ApportVersion ================================= 2.20.1-0ubuntu2.30 == Architecture ================================= amd64 == Date ================================= Tue Jul 20 16:52:44 2021 == Dependencies ================================= adduser 3.113+nmu3ubuntu4 apt 1.2.35 apt-utils 1.2.35 busybox-initramfs 1:1.22.0-15ubuntu1.4 coreutils 8.25-2ubuntu3~16.04 cpio 2.11+dfsg-5ubuntu1.1 debconf 1.5.58ubuntu2 debconf-i18n 1.5.58ubuntu2 debianutils 4.7 dpkg 1.18.4ubuntu1.7+ppa1 [origin: LP-PPA-canonical-is-sa-launchpad] e2fslibs 1.42.13-1ubuntu1.2 e2fsprogs 1.42.13-1ubuntu1.2 gcc-5-base 5.4.0-6ubuntu1~16.04.12 gcc-6-base 6.0.1-0ubuntu1 gnupg 1.4.20-1ubuntu3.3 gpgv 1.4.20-1ubuntu3.3 init-system-helpers 1.29ubuntu4 initramfs-tools 0.122ubuntu8.17 initramfs-tools-bin 0.122ubuntu8.17 initramfs-tools-core 0.122ubuntu8.17 initscripts 2.88dsf-59.3ubuntu2 insserv 1.14.0-5ubuntu3 klibc-utils 2.0.4-8ubuntu1.16.04.4 kmod 22-1ubuntu5.2 libacl1 2.2.52-3 libapt-inst2.0 1.2.35 libapt-pkg5.0 1.2.35 libattr1 1:2.4.47-2 libaudit-common 1:2.4.5-1ubuntu2.1 libaudit1 1:2.4.5-1ubuntu2.1 libblkid1 2.27.1-6ubuntu3.10 libbz2-1.0 1.0.6-8ubuntu0.2 libc6 2.23-0ubuntu11.3 libcomerr2 1.42.13-1ubuntu1.2 libdb5.3 5.3.28-11ubuntu0.2 libfdisk1 2.27.1-6ubuntu3.10 libgcc1 1:6.0.1-0ubuntu1 libgcrypt20 1.6.5-2ubuntu0.6 libgpg-error0 1.21-2ubuntu1 libgpm2 1.20.4-6.1 libklibc 2.0.4-8ubuntu1.16.04.4 libkmod2 22-1ubuntu5.2 liblocale-gettext-perl 1.07-1build1 liblz4-1 0.0~r131-2ubuntu2 liblzma5 5.1.1alpha+20120614-2ubuntu2 libmount1 2.27.1-6ubuntu3.10 libncurses5 6.0+20160213-1ubuntu1 libncursesw5 6.0+20160213-1ubuntu1 libpam-modules 1.1.8-3.2ubuntu2.3 libpam-modules-bin 1.1.8-3.2ubuntu2.3 libpam0g 1.1.8-3.2ubuntu2.3 libpcre3 2:8.38-3.1 libprocps4 2:3.3.10-4ubuntu2.5 libreadline6 6.3-8ubuntu2 libselinux1 2.4-3build2 libsemanage-common 2.3-1build3 libsemanage1 2.3-1build3 libsepol1 2.4-2 libsmartcols1 2.27.1-6ubuntu3.10 libss2 1.42.13-1ubuntu1.2 libstdc++6 5.4.0-6ubuntu1~16.04.12 libsystemd0 229-4ubuntu21.31 libtext-charwidth-perl 0.04-7build5 libtext-iconv-perl 1.7-5build4 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.0+20160213-1ubuntu1 libudev1 229-4ubuntu21.31 libusb-0.1-4 2:0.1.12-28 libustr-1.0-1 1.0.4-5 libuuid1 2.27.1-6ubuntu3.10 libzstd1 1.3.1+dfsg-1~ubuntu0.16.04.1 linux-base 4.5ubuntu1.2~16.04.1 linux-modules-4.15.0-142-generic 4.15.0-142.146~16.04.1 lsb-base 9.20160110ubuntu0.2 mount 2.27.1-6ubuntu3.10 multiarch-support 2.23-0ubuntu11.3 passwd 1:4.2-3.1ubuntu5.4 perl-base 5.22.1-9ubuntu0.9 procps 2:3.3.10-4ubuntu2.5 psmisc 22.21-2.1ubuntu0.1 readline-common 6.3-8ubuntu2 sensible-utils 0.0.9ubuntu0.16.04.1 sysv-rc 2.88dsf-59.3ubuntu2 sysvinit-utils 2.88dsf-59.3ubuntu2 tar 1.28-2.1ubuntu0.2 ubuntu-keyring 2012.05.19.1 udev 229-4ubuntu21.31 util-linux 2.27.1-6ubuntu3.10 uuid-runtime 2.27.1-6ubuntu3.10 zlib1g 1:1.2.8.dfsg-2ubuntu4.3 == DistroRelease ================================= Ubuntu 16.04 == NonfreeKernelModules ================================= lkp_Ubuntu_4_15_0_142_146_generic_78 == Package ================================= linux-image-4.15.0-142-generic 4.15.0-142.146~16.04.1 == PackageArchitecture ================================= amd64 == ProblemType ================================= Bug == ProcCpuinfoMinimal ================================= processor : 15 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7262 8-Core Processor stepping : 0 microcode : 0x8301038 cpu MHz : 1795.684 cache size : 512 KB physical id : 0 siblings : 16 core id : 28 cpu cores : 8 apicid : 57 initial apicid : 57 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 6387.44 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] == ProcEnviron ================================= TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash == ProcVersionSignature ================================= Ubuntu 4.15.0-142.146~16.04.1-generic 4.15.18 == SourcePackage ================================= linux-signed-hwe == Tags ================================= xenial third-party-packages == Uname ================================= Linux 4.15.0-142-generic x86_64 == UpgradeStatus ================================= No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1936958/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp