On Tue, 2016-11-29 at 23:28 -0800, Alexei Starovoitov wrote: > On Mon, Nov 28, 2016 at 10:58 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > > { > > @@ -496,8 +531,13 @@ static bool mlx4_en_process_tx_cq(struct net_device > > *dev, > > wmb(); > > > > /* we want to dirty this cache line once */ > > - ACCESS_ONCE(ring->last_nr_txbb) = last_nr_txbb; > > - ACCESS_ONCE(ring->cons) = ring_cons + txbbs_skipped; > > + WRITE_ONCE(ring->last_nr_txbb, last_nr_txbb); > > + ring_cons += txbbs_skipped; > > + WRITE_ONCE(ring->cons, ring_cons); > > + WRITE_ONCE(ring->ncons, ring_cons + last_nr_txbb); > > + > > + if (dev->doorbell_opt) > > + mlx4_en_xmit_doorbell(dev, ring); > ... > > + /* Doorbell avoidance : We can omit doorbell if we know a TX > > completion > > + * will happen shortly. > > + */ > > + if (send_doorbell && > > + dev->doorbell_opt && > > + (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0) > > + send_doorbell = false; > > this is awesome idea! > I'm missing though why you need all these read_once/write_once. > It's a tx ring that is already protected by tx queue lock > or completion is happening without grabbing the lock?
Yes, TX completion runs lockless, on any good/recent NIC driver. > Then how do we make sure that we don't race here > and indeed above prod_bell - ncons > 0 condition is safe? That is why I have a cmpxchg() > Good description of algorithm would certainly help ;) Sure, when it is ready ;)