> > Where is the TX confirm which uses this stored pointer. I don't see it > > in this file. > > > > The Tx confirm - dpaa2_switch_tx_conf() - is added in patch 5/9.
Not so obvious. Could it be moved here? > > It can be expensive to store pointer like this in buffers used for > > DMA. > > Yes, it is. But the hardware does not give us any other indication that > a packet was actually sent so that we can move ahead with consuming the > initial skb. > > > It has to be flushed out of the cache here as part of the > > send. Then the TX complete needs to invalidate and then read it back > > into the cache. Or you use coherent memory which is just slow. > > > > It can be cheaper to keep a parallel ring in cacheable memory which > > never gets flushed. > > I'm afraid I don't really understand your suggestion. In this parallel > ring I would keep the skb pointers of all frames which are in-flight? > Then, when a packet is received on the Tx confirmation queue I would > have to loop over the parallel ring and determine somehow which skb was > this packet initially associated to. Isn't this even more expensive? I don't know this particular hardware, so i will talk in general terms. Generally, you have a transmit ring. You add new frames to be sent to the beginning of the ring, and you take off completed frames from the end of the ring. This is kept in 'expensive' memory, in that either it is coherent, or you need to do flushed/invalidates. It is expected that the hardware keeps to ring order. It does not pick and choose which frames it sends, it does them in order. That means completion also happens in ring order. So the driver can keep a simple linear array the size of the ring, in cachable memory, with pointers to the skbuf. And it just needs a counting index to know which one just completed. Now, your hardware is more complex. You have one queue feeding multiple switch ports. Maybe it does not keep to ring order? If you have one port running at 10M/Half, and another at 10G/Full, does it leave frames for the 10/Half port in the ring when its egress queue it full? That is probably a bad idea, since the 10G/Full port could then starve for lack of free slots in the ring? So my guess would be, the frames get dropped. And so ring order is maintained. If you are paranoid it could get out of sync, keep an array of tuples, address of the frame descriptor and the skbuf. If the fd address does not match what you expect, then do the linear search of the fd address, and increment a counter that something odd has happened. Andrew