> Date: Fri, 8 May 2020 15:39:26 +0200
> From: Jan Klemkow <j.klem...@wemelug.de>
> 
> On Mon, Apr 27, 2020 at 10:42:52AM +0200, Martin Pieuchot wrote:
> > 
> > So the current logic assumes that as long as the ring contains descriptors
> > em_rxrefill() failures aren't fatal and it is not necessary to schedule a
> > refill.  Why?
> 
> I'm not sure.  I just figured out, what I described below.
> 
> > Unless we understand this it is difficult to reason about a fix.  Have
> > you looked at other drivers, is it a common pattern?
> 
> Drivers like iavf(4) and ixl(4) just set the timeout when the alive
> (if_rxr_inuse returns) counter is 0.  Other drivers like ix(4) and
> bnx(4) call the timeout, when refill was not possible.
> 
> I adjust the patch to the behavior of ix(4).  (see below).
> 
> > > In my case, the function em_rxeof is unable to pull the last one or two
> > > mbufs from the receive ring, because the descriptor status hasn't set
> > > the "descriptor done" flag (E1000_RXD_STAT_DD).  Thus, the ring has one
> > > or two remaining mbufs, and the timeout to put new ones in is never set.
> > > When new mbufs are available later, the em driver stays in that
> > > situation.
> > 
> > Are you saying there's always some descriptor on the ring?  Did you
> > investigate why?
> 
> Yes, with printf-debugging:  I see that we return from em_rxeof()
> because the status bit E1000_RXD_STAT_DD is not set.  And systat(8)
> shows one or two remaining mbufs in the ALIVE column.
> 
> > > The diff always sets the timeout to put new mbufs in the receive ring,
> > > when it has lesser mbufs then the "low water mark".
> > 
> > This changes the logic explained above.  With it the fatal condition
> > changes from 0 to `lwm'.  Another possible fix would be to always
> > schedule a timeout as soon as em_rxrefill() fails.
> 
> I adjusted the patch to this behavior, as I mentioned above.
> 
> > I'm not advocating for a particular change, I'm suggesting we understand
> > the current logic and its implications.
> 
> The logic looks to me like: The NIC should first fill the last mbuf
> (descriptor), then we set a timeout to refill new mbufs (descriptors).
> It seems to me, that one or two mbufs maybe some kind of small to
> operate correctly?!
> 
> But, as far as I understand the documentation of Intel, It should be
> find to deal with such a low amount of mbufs.  I couldn't find any text
> that said, the firmware needs a minimum amount of DMA memory.

Sorry, I was going to reply earlier to this mail thread.

As far as I know, OpenBSD is the only OS that tries to run with
partially filled rings.  When we started doing this we quickly found
out that there was hardware out there that didn't work with only a
single descriptor on the Rx ring.  This is why we introduced the low
water mark.

I'm not certain if we ever found this documented unambiguously the the
hardware documentation available to us.  But there certainly are hints
that indicate that descriptors are read a full cache-line at a time.
And if the descrioptor size is 16 bytes, you need 4 descriptors to
fill a 64-byte cache line.

So what you were proposing in your first diff makes sense.  It maight
make sense to introduce a function to check the limit though, perhaps
something like if_rxr_needfill() that returns true if the number of
descriptors on the ring is below the low water mark.

Reply via email to