Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

Milton Miller Tue, 05 Jun 2007 09:16:51 -0700


First, a question especially to Auke and Jeff:

Since this patch both reverts the broken change that is currently in-rc and creates the fixed driver, I'm not sure I like the subjectstating "on ARM", although that is the feature of the rewrite, and wasthe intent of merging the previous patch. This is actually its a fixfor all systems relative to current, including those where dma is notcache coherent, (unlike a simple revert).

Should we just put a comment about reverting the previous patch earlyin the change log?


Something like this:

Fix the e100 receiver handling, supporting cache incoherent DMA.

Discard the concept of setting the S (suspend) bit with the EL bitintroduced in commit d52df4a35af569071fda3f4eb08e47cc7023f094. Inaddition to it not setting either bit, the hardware doesn't work thatway.



Thoughts?

Here is the changelog portion of the latest patch (quoted), with mycomments thrown in:

On the ARM, their is a race condition between software allocating anew receive


On systems that have cache incoherent DMA, including ARM,

buffer and hardware writing into a buffer. The two race on touchingthe lastReceive Frame Descriptor (RFD). It has its el-bit set and its nextlink equalto 0. When hardware encounters this buffer it attempts to write datato itand then update Status Word bits and Actual Count in the RFD. At thesame timesoftware may try to clear the el-bit and set the link address to a newbuffer.
Since the entire RFD is once cache-line, the two write operations can
collide. This can lead to the receive unit stalling or freed receivebuffers
getting written to.

This can lead to the receive unit stalling or interpreting randommemory as its receive area.

The fix is to set the el-bit on and the size to 0 on the next to lastbufferin the chain. When the hardware encounters this buffer it stops anddoes
not write to it at all.  The hardware issues an RNR interrupt with the
receive unit in the No Resources state. When software allocatesbuffers,
it can update the tail of the list when either it knows the hardware
has stopped or the previous to the new one to mark marked.

Software can write to the tail of the list because it knows hardwarewill stop on the previous descriptor that was marked as the end oflist.

Once it has a new next to last buffer prepared, it can clear the el-bit
and set the size on the previous one.  The race on this buffer is safe
since the link already points to a valid next buffer.

and we can handle the race setting the size (assuming aligned 16 bitwrites are atomic with respect to the DMA read).

This paragraph changed from third person (the software or hardware) tosecond person (we).

  We keep flags
in our software descriptor to note if the el bit is set and if the size
was 0.  When we clear the RFD's el bit and set its size, we also clear
the el flag but we leave the size was 0 bit set.  This was we can find
this buffer again later.

This way software can identify them when the race may have occurredwhen cleaning the ring. On these descriptors, it looks ahead and ifthe next one is complete then hardware must have skipped the currentone. Logic is added to prevent two packets in a row being marked whilethe receiver is running to avoid running in lockstep with the hardwareand thereby limiting the required lookahead.

If the hardware sees the el-bit cleared without the size set, it will
move on to the next buffer and skip this one.  If it sees
the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.

These sentences should be moved to the mention of the race above toreducing mixing descriptions of the hardware and the software.



milton

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

Reply via email to