On 7/8/20 4:59 PM, YU, Xiangning wrote:

> 
> Yes, we are touching a cache line here to make sure aggregation tasklet is 
> scheduled immediately. In most cases it is a call to test_and_set_bit(). 


test_and_set_bit() is dirtying the cache line even if the bit is already set.

> 
> We might be able to do some inline processing without tasklet here, still we 
> need to make sure the aggregation won't run simultaneously on multiple CPUs. 

I am actually surprised you can reach 8 Mpps with so many cache line bouncing 
around.

If you replace the ltb qdisc with standard mq+pfifo_fast, what kind of 
throughput do you get ?

Reply via email to