On Tue, 2018-10-23 at 12:10 +0200, Toke Høiland-Jørgensen wrote: > Saeed Mahameed <sae...@mellanox.com> writes: > > > On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote: > > > Saeed Mahameed <sae...@mellanox.com> writes: > > > > > > > I think that the mlx5 driver doesn't know how to tell the other > > > > device > > > > to stop transmitting to it while it is resetting.. Maybe tariq > > > > or > > > > Jesper know more about this ? > > > > I will look at this tomorrow after noon and will try to > > > > repro... > > > > > > Hi Saeed > > > > > > Did you have a chance to poke at this? :) > > > > HI Toke, yes i have been planing to respond but also i wanted to > > dig > > more, > > > > so the root cause is very clear. > > > > 1. core 1 is doing tx_dev->ndo_xdp_xmit() > > 2. core 2 is doing tx_dev->xdp_set() //remove xdp program. > > Right, it was also my guess that it was related to this interaction. > Thanks for looking into it! > > > and the problem is beyond mlx5, since we don't have a way to tell a > > different core/different netdev to stop xmitting, or at least > > synchronize with it. > > Hmm, ideally there should be some way for the higher level XDP API to > notice this and abort the call before it even reaches the driver on > the > TX side, shouldn't there? At LPC, Jesper and I will be talking about > a > proposal for decoupling the ndo_xdp_xmit() resource allocation from > loading and unloading XDP programs, which I guess could be a way to > deal > with this as well. > > In the meantime... >
Yes totally agree, this is why my fix is temporary. Good Idea about LPC, let's discuss this there. > > I will be waiting for your confirmation that the fix did work. > > I tested your patch, and it does indeed fix the crash. However, it > also > seems to have the effect that the XDP redirect continues to function > even after removing the XDP program on the target device. > > I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets > being > redirected out $TX_IF. Is this intentional? > Interesting, shouldn't happen, unless there is something weird going on when running xpd_fwd -d together with xdp_redirect_map, i just checked the code and if ndo_xdp_set was called with null program we will remove xdp tx resources, nothing suspicious in the driver. I will look at this later this week. > -Toke