On 5/29/2018 9:11 PM, Eric Dumazet wrote:
Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought a regression caught in our regression suite, thanks to KASAN.
If KASAN reported issue was really caused by smaller chunk sizes,
changing allocation
order dynamically will eventually hit the same issue.
Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.
We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585
CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G O
Call Trace:
[<ffffffffba80d7bb>] dump_stack+0x4d/0x72
[<ffffffffb951dc5f>] print_address_description+0x6f/0x260
[<ffffffffb951e1c7>] kasan_report+0x257/0x370
[<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
[<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
[<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
[<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
[<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
[<ffffffffb95f125c>] seq_read+0xa9c/0x1230
[<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
[<ffffffffb9577918>] __vfs_read+0xe8/0x730
[<ffffffffb9578057>] vfs_read+0xf7/0x300
[<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
[<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
[<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f851a7bb30d
RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000
Allocated by task 4488:
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x101/0x5e0
ib_register_device+0xc03/0x1250 [ib_core]
mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
mlx4_add_device+0xa9/0x340 [mlx4_core]
mlx4_register_interface+0x16e/0x390 [mlx4_core]
xhci_pci_remove+0x7a/0x180 [xhci_pci]
do_one_initcall+0xa0/0x230
do_init_module+0x1b9/0x5a4
load_module+0x63e6/0x94c0
SYSC_init_module+0x1a4/0x1c0
SyS_init_module+0xe/0x10
do_syscall_64+0x186/0x420
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at ffff8817df584f40
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 8 bytes to the right of
32-byte region [ffff8817df584f40, ffff8817df584f60)
The buggy address belongs to the page:
page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000
index:0xffff8817df584fc1
flags: 0x880000000000100(slab)
raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff883ff78d26c0
Memory state around the buggy address:
ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
^
ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <eduma...@google.com>
Cc: John Sperbeck <jsperb...@google.com>
Cc: Tarick Bedeir <tar...@google.com>
Cc: Qing Huang <qing.hu...@oracle.com>
Cc: Daniel Jurgens <dani...@mellanox.com>
Cc: Zhu Yanjun <yanjun....@oracle.com>
Cc: Tariq Toukan <tar...@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c
b/drivers/net/ethernet/mellanox/mlx4/icm.c
index
685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a
100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,13 @@
#include "fw.h"
/*
- * We allocate in page size (default 4KB on many archs) chunks to avoid high
- * order memory allocations in fragmented/high usage memory situation.
+ * We allocate in as big chunks as we can, up to a maximum of 256 KB
+ * per chunk. Note that the chunks are not necessarily in contiguous
+ * physical memory.
*/
enum {
- MLX4_ICM_ALLOC_SIZE = PAGE_SIZE,
- MLX4_TABLE_CHUNK_SIZE = PAGE_SIZE,
+ MLX4_ICM_ALLOC_SIZE = 1 << 18,
+ MLX4_TABLE_CHUNK_SIZE = 1 << 18,
};
static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
@@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int
npages,
struct mlx4_icm *icm;
struct mlx4_icm_chunk *chunk = NULL;
int cur_order;
+ gfp_t mask;
int ret;
/* We use sg_set_buf for coherent allocs, which assumes low memory */
@@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int
npages,
while (1 << cur_order > npages)
--cur_order;
+ mask = gfp_mask;
+ if (cur_order)
+ mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
if (coherent)
ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
&chunk->mem[chunk->npages],
- cur_order, gfp_mask);
+ cur_order, mask);
else
ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
- cur_order, gfp_mask,
+ cur_order, mask,
dev->numa_node);
if (ret) {