Hi, > -----Original Message----- > From: Slava Ovsiienko <[email protected]> > Sent: Tuesday, May 30, 2023 6:13 PM > To: [email protected] > Cc: Ori Kam <[email protected]>; Raslan Darawsheh <[email protected]>; > Matan Azrad <[email protected]>; [email protected] > Subject: [PATCH 1/1] net/mlx5: fix device removal event handling > > On the device removal kernel notifies user space application with queueing the > IBV_DEVICE_FATAL_EVENT and triggering appropriate file descriptor. Mellanox > kernel driver stack emits this event twice from different layers (mlx5 and > uverbs). The IB port index is not applicable in the event structure and should > be ignored for IBV_DEVICE_FATAL_EVENT events. > > Also, on the older kernels (at least from OFED 4.9) there might be race > conditions causing the event queue close before application fetches the > IBV_DEVICE_FATAL_EVENT message with ibv_get_async_event() API. > > To provide the reliable device removal event detection the patch: > > - ignores the IB port index for the IBV_DEVICE_FATAL_EVENT > - introduces the flag to notify PMD about removal only once > - acks event with ibv_ack_async_event after actual handling > - checks for EIO error, making sure queue is not closed yet > > Fixes: 40d9f906f4e2 ("net/mlx5: fix device removal handler for multiport") > Cc: [email protected] > > Signed-off-by: Viacheslav Ovsiienko <[email protected]> > ---
Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh

