Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

Pre-existing issues:
- [Critical] Off-by-one error in mlx5r_create_mkeys error handling path causes 
hardware mkey leak and invalid destroy.
- [High] Hardware mkeys are leaked in `ib_frmr_pools_set_pinned` if pushing a 
handle to the pool's queue fails.
--

commit 08f720e58b52ed548a72c452a9bd15258e2e9264
Author: Zhiping Zhang <[email protected]>

RDMA/mlx5: get tph for p2p access when registering dma-buf mr

This commit queries dma-buf TPH metadata when registering a dma-buf MR for
peer-to-peer access, falling back to no-TPH if absent. It also transfers
the ST-table reference ownership to the hardware mkey handle rather than
the transient MR object, and updates the pool API to handle this transfer.

> diff --git a/drivers/infiniband/core/frmr_pools.c 
> b/drivers/infiniband/core/frmr_pools.c
> --- a/drivers/infiniband/core/frmr_pools.c
> +++ b/drivers/infiniband/core/frmr_pools.c
[ ... ]
> @@ -430,16 +430,16 @@ int ib_frmr_pools_set_pinned(struct ib_device *device, 
> struct ib_frmr_key *key,
>       ret = pools->pool_ops->create_frmrs(device, key, handles,
>                                           needed_handles);
>       if (ret) {
>               kfree(handles);
>               return ret;
>       }
>  
>       spin_lock(&pool->lock);
>       for (i = 0; i < needed_handles; i++) {
>               ret = push_handle_to_queue_locked(&pool->queue,
>                                                 handles[i]);
>               if (ret)
>                       goto end;
>       }
>  
>  end:
>       spin_unlock(&pool->lock);
>       kfree(handles);

[Severity: High]
This is a pre-existing issue, but can this error path leak the successfully
created hardware mkeys?

In ib_frmr_pools_set_pinned(), up to needed_handles hardware mkeys are
successfully created via create_frmrs(). If push_handle_to_queue_locked()
fails, the code jumps to the end: label and frees the handles array.

However, the mkeys that were successfully created but not yet pushed don't
appear to be explicitly destroyed here. Could this lead to a bulk resource
leak if queue page allocation fails?

> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
[ ... ]
> @@ -335,16 +335,16 @@ static int mlx5r_create_mkeys(struct ib_device *device, 
> struct ib_frmr_key *key,
>       for (i = 0; i < count; i++) {
>               assign_mkey_variant(dev, handles + i, in);
>               err = mlx5_core_create_mkey(dev->mdev, handles + i, in, inlen);
>               if (err)
>                       goto free_in;
>       }
>  free_in:
>       kfree(in);
>       if (err)
>               for (; i > 0; i--)
>                       mlx5_core_destroy_mkey(dev->mdev, handles[i]);

[Severity: Critical]
This is a pre-existing issue, but does this rollback loop properly clean up
the successfully created mkeys?

When mlx5_core_create_mkey() fails at index i, the rollback loop 
for (; i > 0; i--) attempts to destroy handles[i]. Because the creation
failed, handles[i] still contains the unallocated index preset by
assign_mkey_variant().

Executing mlx5_core_destroy_mkey() on this value could send a destroy
command for an unowned mkey, potentially causing hardware state corruption.

Additionally, since the loop stops at i > 0, does it permanently leak the
successfully created mkey at handles[0]?

>       return err;
>  }

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=5

Reply via email to