------- Comment From niklas.schne...@ibm.com 2020-04-22 04:59 EDT-------
---Problem Description---
Using the mlx5 device driver on Ubuntu 20.04 (beta), the alloc_pages_nodemask 
code generates a stack trace when initializing a device. The driver tries to 
allocate more contiguous memory than is allowed by the platform specific 
FORCE_MAX_ZONEORDER setting.

FORCE_MAX_ZONEORDER on s390x: 9
FORCE_MAX_ZONEORDER on other platforms: 11 or more

This issue only occurs on ConnectX5 devices because the mlx5_fw_tracer
code is only used for physical functions.

---Additional Hardware Info---
Z15 partition with Mojave (ConnectX5) adapter


---uname output---
Linux pok1-qz1-sr1-rk011-s21 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:46:43 
UTC 2020 s390x s390x s390x GNU/Linux

Machine Type = Z15

---Debugger---
A debugger is not configured

---Steps to Reproduce---
Start a partition with a Mojave (ConnectX5) adapter

Stack trace output:
[  331.531813] ------------[ cut here ]------------
[  331.531819] WARNING: CPU: 7 PID: 2156 at mm/page_alloc.c:4727 
__alloc_pages_nodemask+0x25c/0x320
[  331.531820] Modules linked in: mlx5_core(+) mlxfw tls ptp pps_core s390_trng 
chsc_sch vfio_ccw vfio_mdev mdev eadm_sch vfio_iommu_type1 vfio sch_fq_codel 
ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 linear dm_service_time pkey zcrypt crc32_vx_s390 
ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 qeth_l2 
sha512_s390 sha256_s390 sha1_s390 sha_common zfcp qeth scsi_transport_fc qdio 
ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[  331.531833] CPU: 7 PID: 2156 Comm: systemd-udevd Not tainted 
5.4.0-14-generic #17-Ubuntu
[  331.531833] Hardware name: IBM 8562 GT2 A00 (LPAR)
[  331.531834] Krnl PSW : 0704c00180000000 00000000735d720c 
(__alloc_pages_nodemask+0x25c/0x320)
[  331.531836]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[  331.531837] Krnl GPRS: 000000007418d687 0000000000040dc0 0000000000040dc0 
000000000000000a
[  331.531837]            0000000000000000 0000000000000000 000000000000000a 
000003ff8042607e
[  331.531838]            0000000000000dc0 00000000002203b0 000000000000000a 
00000001c9480120
[  331.531838]            00000001ecda4400 0000000000000055 000003e001943680 
000003e001943600
[  331.531844] Krnl Code: 00000000735d7200: a7212000            tmll    %r2,8192
00000000735d7204: a774ff87              brc     7,00000000735d7112
#00000000735d7208: a7f40001             brc     15,00000000735d720a
>00000000735d720c: a7890000             lghi    %r8,0
00000000735d7210: a7f4ff83              brc     15,00000000735d7116
00000000735d7214: a7180000              lhi     %r1,0
00000000735d7218: a7f4ff1b              brc     15,00000000735d704e
00000000735d721c: e31003400004  lg      %r1,832
[  331.531851] Call Trace:
[  331.531852] ([<0000000000000201>] 0x201)
[  331.531856]  [<00000000735a20c4>] kmalloc_order+0x34/0xb0
[  331.531856]  [<00000000735a2172>] kmalloc_order_trace+0x32/0xe0
[  331.531880]  [<000003ff8042607e>] mlx5_fw_tracer_create+0x3e/0x500 
[mlx5_core]
[  331.531899]  [<000003ff803ffa88>] mlx5_init_once+0x148/0x3c0 [mlx5_core]
[  331.531917]  [<000003ff8040152a>] mlx5_load_one+0x7a/0x240 [mlx5_core]
[  331.531935]  [<000003ff804018d8>] init_one+0x1e8/0x310 [mlx5_core]
[  331.531939]  [<0000000073916e16>] local_pci_probe+0x56/0xc0
[  331.531941]  [<0000000073917ef2>] pci_device_probe+0x132/0x1e0
[  331.531942]  [<00000000739a1374>] really_probe+0xf4/0x460
[  331.531943]  [<00000000739a1a60>] driver_probe_device+0x130/0x190
[  331.531944]  [<00000000739a1dae>] device_driver_attach+0x7e/0xa0
[  331.531945]  [<00000000739a1e86>] __driver_attach+0xb6/0x180
[  331.531947]  [<000000007399eae2>] bus_for_each_dev+0x82/0xc0
[  331.531948]  [<00000000739a030a>] bus_add_driver+0x16a/0x260
[  331.531949]  [<00000000739a2b38>] driver_register+0x88/0x150
[  331.531967]  [<000003ff80362080>] init+0x80/0xb0 [mlx5_core]
[  331.531968]  [<00000000733648bc>] do_one_initcall+0x3c/0x200
[  331.531970]  [<0000000073495fc0>] do_init_module+0x70/0x270
[  331.531970]  [<00000000734983b2>] load_module+0x1142/0x1440
[  331.531971]  [<00000000734988e4>] __do_sys_finit_module+0xa4/0xf0
[  331.531973]  [<0000000073c54ec2>] system_call+0x2a6/0x2c8
[  331.531974] Last Breaking-Event-Address:
[  331.531975]  [<00000000735d7208>] __alloc_pages_nodemask+0x258/0x320
[  331.531975] ---[ end trace 5985b580c6dbfd3e ]---


Oops output:
[  331.244901] pci 0100:00:00.0: [15b3:1019] type 00 class 0x020000
[  331.245195] pci 0100:00:00.0: reg 0x10: [mem 
0xffffc00000000000-0xffffc00001ffffff 64bit pref]
[  331.245479] pci 0100:00:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[  331.245518] pci 0100:00:00.0: enabling Extended Tags
[  331.246291] pci 0100:00:00.0: PME# supported from D3cold
[  331.246619] pci 0100:00:00.0: reg 0x1a4: [mem 
0xffffc00002000000-0xffffc000021fffff 64bit pref]
[  331.246620] pci 0100:00:00.0: VF(n) BAR0 space: [mem 
0xffffc00002000000-0xffffc00009ffffff 64bit pref] (contains BAR0 for 64 VFs)
[  331.250657] pci 0100:00:00.0: Adding to iommu group 0
[  331.280192] pps_core: LinuxPPS API ver. 1 registered
[  331.280193] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo 
Giometti <giome...@linux.it>
[  331.281955] PTP clock support registered
[  331.313648] mlx5_core 0100:00:00.0: enabling device (0000 -> 0002)
[  331.313848] mlx5_core 0100:00:00.0: firmware version: 16.25.1020
[  331.313879] mlx5_core 0100:00:00.0: 252.048 Gb/s available PCIe bandwidth 
(16 GT/s x16 link)
[  331.531813] ------------[ cut here ]------------
[  331.531819] WARNING: CPU: 7 PID: 2156 at mm/page_alloc.c:4727 
__alloc_pages_nodemask+0x25c/0x320
[  331.531820] Modules linked in: mlx5_core(+) mlxfw tls ptp pps_core s390_trng 
chsc_sch vfio_ccw vfio_mdev mdev eadm_sch vfio_iommu_type1 vfio sch_fq_codel 
ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 linear dm_service_time pkey zcrypt crc32_vx_s390 
ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 qeth_l2 
sha512_s390 sha256_s390 sha1_s390 sha_common zfcp qeth scsi_transport_fc qdio 
ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[  331.531833] CPU: 7 PID: 2156 Comm: systemd-udevd Not tainted 
5.4.0-14-generic #17-Ubuntu
[  331.531833] Hardware name: IBM 8562 GT2 A00 (LPAR)
[  331.531834] Krnl PSW : 0704c00180000000 00000000735d720c 
(__alloc_pages_nodemask+0x25c/0x320)
[  331.531836]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[  331.531837] Krnl GPRS: 000000007418d687 0000000000040dc0 0000000000040dc0 
000000000000000a
[  331.531837]            0000000000000000 0000000000000000 000000000000000a 
000003ff8042607e
[  331.531838]            0000000000000dc0 00000000002203b0 000000000000000a 
00000001c9480120
[  331.531838]            00000001ecda4400 0000000000000055 000003e001943680 
000003e001943600
[  331.531844] Krnl Code: 00000000735d7200: a7212000            tmll    %r2,8192
00000000735d7204: a774ff87              brc     7,00000000735d7112
#00000000735d7208: a7f40001             brc     15,00000000735d720a
>00000000735d720c: a7890000             lghi    %r8,0
00000000735d7210: a7f4ff83              brc     15,00000000735d7116
00000000735d7214: a7180000              lhi     %r1,0
00000000735d7218: a7f4ff1b              brc     15,00000000735d704e
00000000735d721c: e31003400004  lg      %r1,832
[  331.531851] Call Trace:
[  331.531852] ([<0000000000000201>] 0x201)
[  331.531856]  [<00000000735a20c4>] kmalloc_order+0x34/0xb0
[  331.531856]  [<00000000735a2172>] kmalloc_order_trace+0x32/0xe0
[  331.531880]  [<000003ff8042607e>] mlx5_fw_tracer_create+0x3e/0x500 
[mlx5_core]
[  331.531899]  [<000003ff803ffa88>] mlx5_init_once+0x148/0x3c0 [mlx5_core]
[  331.531917]  [<000003ff8040152a>] mlx5_load_one+0x7a/0x240 [mlx5_core]
[  331.531935]  [<000003ff804018d8>] init_one+0x1e8/0x310 [mlx5_core]
[  331.531939]  [<0000000073916e16>] local_pci_probe+0x56/0xc0
[  331.531941]  [<0000000073917ef2>] pci_device_probe+0x132/0x1e0
[  331.531942]  [<00000000739a1374>] really_probe+0xf4/0x460
[  331.531943]  [<00000000739a1a60>] driver_probe_device+0x130/0x190
[  331.531944]  [<00000000739a1dae>] device_driver_attach+0x7e/0xa0
[  331.531945]  [<00000000739a1e86>] __driver_attach+0xb6/0x180
[  331.531947]  [<000000007399eae2>] bus_for_each_dev+0x82/0xc0
[  331.531948]  [<00000000739a030a>] bus_add_driver+0x16a/0x260
[  331.531949]  [<00000000739a2b38>] driver_register+0x88/0x150
[  331.531967]  [<000003ff80362080>] init+0x80/0xb0 [mlx5_core]
[  331.531968]  [<00000000733648bc>] do_one_initcall+0x3c/0x200
[  331.531970]  [<0000000073495fc0>] do_init_module+0x70/0x270
[  331.531970]  [<00000000734983b2>] load_module+0x1142/0x1440
[  331.531971]  [<00000000734988e4>] __do_sys_finit_module+0xa4/0xf0
[  331.531973]  [<0000000073c54ec2>] system_call+0x2a6/0x2c8
[  331.531974] Last Breaking-Event-Address:
[  331.531975]  [<00000000735d7208>] __alloc_pages_nodemask+0x258/0x320
[  331.531975] ---[ end trace 5985b580c6dbfd3e ]---
[  331.534155] port_module: 3 callbacks suppressed
[  331.534156] mlx5_core 0100:00:00.0: Port module event: module 0, Cable 
plugged
[  331.548450] mlx5_core 0100:00:00.0: MLX5E: StrdRq(1) RqSz(16) StrdSz(4096) 
RxCqeCmprss(0)
[  331.694225] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[  331.714227] mlx5_core 0100:00:00.0 enP256s401f0: renamed from eth0

------- Comment From niklas.schne...@ibm.com 2020-04-22 05:03 EDT-------
A fix for this has recently been pulled into David Miller's net tree as part of 
a series of Mellanox fixes:
https://lore.kernel.org/netdev/20200420213606.44292-1-sae...@mellanox.com/

It hasn't landed in Linus' tree yet though

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1874058

Title:
  [UBUNTU 20.04] mlx5: alloc_pages_nodemask stack trace

Status in linux package in Ubuntu:
  New

Bug description:
  Description will follow

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1874058/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to