Public bug reported:

While load/unload mlx4_core twice  with SR-IOV mode enabled in host with
multiple Mellanox devices (some of them support SR-IOV and other don't)
this will lead to kernel panic.

The following two upstream commits fix this issue:

commit 32b4ca5af1cf1c558dfca0e3417e9b35402401a6
Author: Carol L Soto <cls...@linux.vnet.ibm.com>
Date:   Tue Jun 2 16:07:23 2015 -0500

    net/mlx4_core: double free of dev_vfs
    
    If user loads mlx4_core with num_vfs greater than
    supported then variable dev->dev_vfs is freed 2 times after unloading the
    driver.
    
    Acked-by: Or Gerlitz <ogerl...@mellanox.com>
    Signed-off-by: Carol L Soto <cls...@linux.vnet.ibm.com>
    Signed-off-by: David S. Miller <da...@davemloft.net>


commit 7095b39f3189d2107045d769fdc32dfc0b704028
Author: Carol Soto <cls...@linux.vnet.ibm.com>
Date:   Tue Jun 2 16:07:24 2015 -0500

    net/mlx4_core: need to call close fw if alloc icm is called twice
    
    If mlx4_enable_sriov is called by adapter without this
    feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc
    icm is called twice without freeing the structures from the first time.
    
    Acked-by: Or Gerlitz <ogerl...@mellanox.com>
    Signed-off-by: Carol L Soto <cls...@linux.vnet.ibm.com>
    Signed-off-by: David S. Miller <da...@davemloft.net>


Steps to reproduce:
1- add the "options mlx4_core num_vfs=60 port_type_array=2,2" to 
/etc/modprobe.d/mlx4_core.conf file.
2- unload mlx4_* kernel modules: modprobe -rv mlx4_en; modprobe -rv mlx4_ib; 
modprobe -rv mlx4_core;
3- load mlx4_en kernel module:  modprobe -v mlx4_en
4- edit /etc/modprobe.d/mlx4_core.conf file and put "options mlx4_core 
num_vfs=60 port_type_array=2,2" in comment.
5 -repeat 2 and 3
6- will get the following call trace.


Call Trace:
 1175.699487] mlx4_core 0000:24:00.0: Received reset from slave:7 
[ 1175.767388] mlx4_core 0000:24:00.0: Received reset from slave:6 
[ 1175.830898] mlx4_core 0000:24:00.0: Received reset from slave:5 
[ 1175.898229] mlx4_core 0000:24:00.0: Received reset from slave:4 
[ 1175.963514] mlx4_core 0000:24:00.0: Received reset from slave:3 
[ 1176.035312] mlx4_core 0000:24:00.0: Received reset from slave:2 
[ 1176.105085] mlx4_core 0000:24:00.0: Received reset from slave:1 
[ 1177.253200] mlx4_core 0000:24:00.0: Disabling SR-IOV            
[ 1179.724864] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 1179.724885] mlx4_core: Initializing 0000:21:00.0                       
[ 1185.760555] mlx4_core 0000:21:00.0: Enabling SR-IOV with 60 VFs        
[ 1185.760575] mlx4_core 0000:21:00.0: Failed to enable SR-IOV, continuing 
without SR-IOV (err = -22)
[ 1185.770550] mlx4_core 0000:21:00.0: PCIe link speed is 8.0GT/s, device 
supports 8.0GT/s                                                                
                                           
[ 1185.770552] mlx4_core 0000:21:00.0: PCIe link width is x8, device supports 
x8                                                                              
                                       
[ 1185.771870] ------------[ cut here ]------------                             
                                                                                
                                     
[ 1185.771878] WARNING: CPU: 6 PID: 5947 at 
/build/buildd/linux-3.19.0/fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()         
                                                                         
[ 1185.771880] sysfs: cannot create duplicate filename 
'/devices/pci0000:20/0000:20:03.0/0000:21:00.0/msi_irqs/57'                     
                                                              
[ 1185.771881] Modules linked in: mlx4_core(+) vxlan ip6_udp_tunnel udp_tunnel 
mst_pciconf(OE) mst_pci(OE) nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables bridge stp llc ipmi_ssif intel_rapl iosf_mbi 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
dm_multipath glue_helper scsi_dh ablk_helper cryptd joydev lpc_ich serio_raw 
ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter ioatdma dca hpilo mac_hid 
wmi sb_edac edac_core shpchp nfsd auth_rpcgss                                   
                                                                                
                                                      
[ 1185.771920]  nfs_acl lockd grace sunrpc autofs4 hid_generic usbhid tg3 
pata_acpi ptp hid psmouse hpsa pps_core [last unloaded: ib_addr]                
                                           
[ 1185.771931] CPU: 6 PID: 5947 Comm: modprobe Tainted: G           OE  
3.19.0-16-generic #16-Ubuntu                                                    
                                             
[ 1185.771932] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 03/01/2013      
                                                                                
                                     
[ 1185.771934]  ffffffff81abb6d8 ffff88086cdb37c8 ffffffff817c2235 
0000000000000007                                                                
                                                  
[ 1185.771936]  ffff88086cdb3818 ffff88086cdb3808 ffffffff8107595a 
0000000000000292                                                                
                                                  
[ 1185.771938]  ffff88084d1ea000 ffff88086d1c1648 ffff8807b3df62d0 
ffff880867ab85a0                                                                
                                                  
[ 1185.771941] Call Trace:                                                      
                                                                                
                                     
[ 1185.771949]  [<ffffffff817c2235>] dump_stack+0x45/0x57                       
                                                                                
                                     
[ 1185.771953]  [<ffffffff8107595a>] warn_slowpath_common+0x8a/0xc0             
                                                                                
                                     
[ 1185.771955]  [<ffffffff810759d6>] warn_slowpath_fmt+0x46/0x50                
                                                                                
                                     
[ 1185.771958]  [<ffffffff8126ab58>] ? kernfs_path+0x48/0x60                    
                                                                                
                                     
[ 1185.771961]  [<ffffffff8126e508>] sysfs_warn_dup+0x68/0x80                   
                                                                                
                                     
[ 1185.771963]  [<ffffffff8126e1ff>] sysfs_add_file_mode_ns+0x14f/0x1c0         
                                                                                
                                     
[ 1185.771966]  [<ffffffff8126c050>] ? kernfs_create_dir_ns+0x50/0x80           
                                                                                
                                     
[ 1185.771969]  [<ffffffff8126edf9>] internal_create_group+0xd9/0x280           
                                                                                
                                     
[ 1185.771971]  [<ffffffff8126f0d9>] sysfs_create_groups+0x49/0xa0              
                                                                                
                                     
[ 1185.771976]  [<ffffffff8141bfad>] populate_msi_sysfs+0x1bd/0x200             
                                                                                
                                     
[ 1185.771978]  [<ffffffff8141c4c8>] pci_enable_msix+0x158/0x3c0                
                                                                                
                                     
[ 1185.771980]  [<ffffffff8141c75d>] pci_enable_msix_range+0x2d/0x70            
                                                                                
                                     
[ 1185.771991]  [<ffffffffc0900245>] mlx4_load_one+0xea5/0x1410 [mlx4_core]     
                                                                                
                                     
[ 1185.771999]  [<ffffffffc0900c9b>] mlx4_init_one+0x4eb/0x600 [mlx4_core]      
                                                                                
                                     
[ 1185.772003]  [<ffffffff81401155>] local_pci_probe+0x45/0xa0                  
                                                                                
                                     
[ 1185.772005]  [<ffffffff81402345>] ? pci_match_device+0xe5/0x110              
                                                                                
                                     
[ 1185.772007]  [<ffffffff81402489>] pci_device_probe+0xd9/0x130                
                                                                                
                                     
[ 1185.772012]  [<ffffffff81506523>] driver_probe_device+0xa3/0x410             
                                                                                
                                     
[ 1185.772014]  [<ffffffff8150696b>] __driver_attach+0x9b/0xa0                  
                                                                                
                                     
[ 1185.772016]  [<ffffffff815068d0>] ? __device_attach+0x40/0x40                
                                                                                
                                     
[ 1185.772020]  [<ffffffff815042eb>] bus_for_each_dev+0x6b/0xb0                 
                                                                                
                                     
[ 1185.772022]  [<ffffffff81505f8e>] driver_attach+0x1e/0x20                    
                                                                                
                                     
[ 1185.772024]  [<ffffffff81505b60>] bus_add_driver+0x180/0x250                 
                                                                                
                                     
[ 1185.772027]  [<ffffffffc0344000>] ? 0xffffffffc0344000                       
                                                                                
                                     
[ 1185.772030]  [<ffffffff81507164>] driver_register+0x64/0xf0                  
                                                                                
                                     
[ 1185.772034]  [<ffffffff8140098c>] __pci_register_driver+0x4c/0x50            
                                                                                
                                     
[ 1185.772042]  [<ffffffffc0344126>] mlx4_init+0x126/0x1000 [mlx4_core]         
                                                                                
                                     
[ 1185.772047]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210                 
                                                                                
                                     
[ 1185.772053]  [<ffffffff811d5b49>] ? kmem_cache_alloc_trace+0x189/0x200       
                                                                                
                                     
[ 1185.772058]  [<ffffffff810f99c4>] ? load_module+0x15a4/0x1ce0                
                                                                                
                                     
[ 1185.772061]  [<ffffffff810f99fe>] load_module+0x15de/0x1ce0                  
                                                                                
                                     
[ 1185.772063]  [<ffffffff810f51d0>] ? store_uevent+0x40/0x40                   
                                                                                
                                     
[ 1185.772067]  [<ffffffff810fa276>] SyS_finit_module+0x86/0xb0                 
                                                                                
                                     
[ 1185.772072]  [<ffffffff817c934d>] system_call_fastpath+0x16/0x1b             
                                                                                
                                     
[ 1185.772074] ---[ end trace 9d9c0896e72e5312 ]---                             
                                                                                
                                     
[ 1185.873139] mlx4_core 0000:21:00.0: command 0x31 timed out (go bit not 
cleared)                                                                        
                                           
[ 1185.873147] mlx4_core 0000:21:00.0: device is going to be reset              
                                                                                
                                     
[ 1186.881239] mlx4_core 0000:21:00.0: device was reset successfully            
                                                                                
                                     
[ 1186.888006] mlx4_core 0000:21:00.0: NOP command failed to generate interrupt 
(IRQ 53), aborting                                                              
                                     
[ 1186.897831] mlx4_core 0000:21:00.0: BIOS or ACPI interrupt routing problem?  
                                                                                
                                     
[ 1186.907762] BUG: unable to handle kernel NULL pointer dereference at 
000000000000001c                                                                
                                             
[ 1186.916462] IP: [<ffffffff81181185>] __free_pages+0x5/0x30                   
                                                                                
                                     
[ 1186.922560] PGD 0                                                            
                                                                                
                                     
[ 1186.924814] Oops: 0002 [#1] SMP                                              
                                                                                
                                     
[ 1186.928423] Modules linked in: mlx4_core(+) vxlan ip6_udp_tunnel udp_tunnel 
mst_pciconf(OE) mst_pci(OE) nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter ip_tables x_tables bridge stp llc ipmi_ssif intel_rapl iosf_mbi 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
dm_multipath glue_helper scsi_dh ablk_helper cryptd joydev lpc_ich serio_raw 
ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter ioatdma dca hpilo mac_hid 
wmi sb_edac edac_core shpchp nfsd auth_rpcgss                                   
                                                                                
                                                      
[ 1187.008078]  nfs_acl lockd grace sunrpc autofs4 hid_generic usbhid tg3 
pata_acpi ptp hid psmouse hpsa pps_core [last unloaded: ib_addr]                
                                           
[ 1187.020643] CPU: 8 PID: 5947 Comm: modprobe Tainted: G        W  OE  
3.19.0-16-generic #16-Ubuntu                                                    
                                             
[ 1187.030455] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 03/01/2013      
                                                                                
                                     
[ 1187.037778] task: ffff88079d6cb110 ti: ffff88086cdb0000 task.ti: 
ffff88086cdb0000                                                                
                                                 
[ 1187.046064] RIP: 0010:[<ffffffff81181185>]  [<ffffffff81181185>] 
__free_pages+0x5/0x30                                                           
                                                 
[ 1187.054859] RSP: 0018:ffff88086cdb39a0  EFLAGS: 00010206                     
                                                                                
                                     
[ 1187.060730] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 
0000000000000000                                                                
                                                     
[ 1187.068610] RDX: 00000000000ffff8 RSI: 0000000000000014 RDI: 
0000000000000000                                                                
                                                     
[ 1187.076492] RBP: ffff88086cdb39e8 R08: 0000000000000040 R09: 
0000000000000000                                                                
                                                     
[ 1187.084374] R10: 0000000000000040 R11: ffff88079bbf6000 R12: 
ffff8807b3e20000                                                                
                                                     
[ 1187.092253] R13: ffff88086921a420 R14: ffff88086921a400 R15: 
0000000000000001                                                                
                                                     
[ 1187.100139] FS:  00007fadaa1b9700(0000) GS:ffff88087f840000(0000) 
knlGS:0000000000000000                                                          
                                                
[ 1187.109092] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                
                                                                                
                                     
[ 1187.115445] CR2: 000000000000001c CR3: 0000000823f6f000 CR4: 
00000000000407e0                                                                
                                                     
[ 1187.123336] Stack:                                                           
                                                                                
                                     
[ 1187.125570]  ffffffffc08f9d9f 0000000000000099 ffff88086921a3e0 
ffff88086cdb39e8
[ 1187.133802]  0000000000000099 ffff8807b3e20000 ffff8807b3e23268 
0000000000000099
[ 1187.142030]  ffff8807b3e20000 ffff88086cdb3a18 ffffffffc08fab7c 
ffff8807b3e20000
[ 1187.150264] Call Trace:
[ 1187.153003]  [<ffffffffc08f9d9f>] ? mlx4_free_icm+0x17f/0x1d0 [mlx4_core]
[ 1187.160526]  [<ffffffffc08fab7c>] mlx4_cleanup_icm_table+0x5c/0x80 
[mlx4_core]
[ 1187.168537]  [<ffffffffc08fb5bd>] mlx4_free_icms+0x1d/0x100 [mlx4_core]
[ 1187.175849]  [<ffffffffc08fba8b>] mlx4_close_hca+0x4b/0x70 [mlx4_core]
[ 1187.183072]  [<ffffffffc08ff943>] mlx4_load_one+0x5a3/0x1410 [mlx4_core]
[ 1187.190480]  [<ffffffffc0900c9b>] mlx4_init_one+0x4eb/0x600 [mlx4_core]
[ 1187.197786]  [<ffffffff81401155>] local_pci_probe+0x45/0xa0
[ 1187.203944]  [<ffffffff81402345>] ? pci_match_device+0xe5/0x110
[ 1187.210485]  [<ffffffff81402489>] pci_device_probe+0xd9/0x130
[ 1187.216842]  [<ffffffff81506523>] driver_probe_device+0xa3/0x410
[ 1187.223478]  [<ffffffff8150696b>] __driver_attach+0x9b/0xa0
[ 1187.229643]  [<ffffffff815068d0>] ? __device_attach+0x40/0x40
[ 1187.236002]  [<ffffffff815042eb>] bus_for_each_dev+0x6b/0xb0
[ 1187.242256]  [<ffffffff81505f8e>] driver_attach+0x1e/0x20
[ 1187.248222]  [<ffffffff81505b60>] bus_add_driver+0x180/0x250
[ 1187.254479]  [<ffffffffc0344000>] ? 0xffffffffc0344000
[ 1187.260158]  [<ffffffff81507164>] driver_register+0x64/0xf0
[ 1187.266334]  [<ffffffff8140098c>] __pci_register_driver+0x4c/0x50
[ 1187.273077]  [<ffffffffc0344126>] mlx4_init+0x126/0x1000 [mlx4_core]
[ 1187.280112]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
[ 1187.286383]  [<ffffffff811d5b49>] ? kmem_cache_alloc_trace+0x189/0x200
[ 1187.293753]  [<ffffffff810f99c4>] ? load_module+0x15a4/0x1ce0
[ 1187.300109]  [<ffffffff810f99fe>] load_module+0x15de/0x1ce0
[ 1187.306271]  [<ffffffff810f51d0>] ? store_uevent+0x40/0x40
[ 1187.312333]  [<ffffffff810fa276>] SyS_finit_module+0x86/0xb0
[ 1187.318595]  [<ffffffff817c934d>] system_call_fastpath+0x16/0x1b
[ 1187.325233] Code: 74 1c 48 8b 03 90 48 8b 7b 08 48 83 c3 10 44 89 ea 4c 89 
e6 ff d0 48 8b 03 48 85 c0 75 e8 eb a6 66 0f 1f 44 00 00 66 66 66 66 90 <f0> ff 
4f 1c 74 05 c3 0f 1f 40 00 55 85 f6 48 89 e5 74 08 e8 d3
[ 1187.346856] RIP  [<ffffffff81181185>] __free_pages+0x5/0x30
[ 1187.353034]  RSP <ffff88086cdb39a0>
[ 1187.356900] CR2: 000000000000001c
[ 1187.361080] ---[ end trace 9d9c0896e72e5313 ]---

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: vivid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1473883

Title:
  Kernel panics on mlx4_core (Mellanox Core driver) with SR-IOV mode

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1473883/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to