On 16-03-30 11:30 AM, Saeed Mahameed wrote: > On Wed, Mar 30, 2016 at 8:04 PM, John Fastabend > <john.fastab...@gmail.com> wrote: >> >> OK, so let me see if I get this right now. This was the precedence >> before the patch in the normal no select queue case, >> >> (1) socket mapping sk_tx_queue_mapping iff !ooo_okay >> (2) xps >> (3) skb->queue_mapping >> (4) qoffset/qcount (hash over tc queues) >> (5) hash over num_tx_queues >> >> With this patch the precedence is a bit changed because >> skb_tx_hash is always called. >> >> (1) socket mapping sk_tx_queue_mapping iff !ooo_okay >> (2) skb->queue_mapping >> (3) qoffset/qcount >> (hash over tc queues if xps choice is > qcount) >> (4) xps >> (5) hash over num_tx_queues >> >> Sound right? Nice thing about this with correct configuration >> of tc with qcount = xps_queues it sort of works as at least > > Yes ! > for qcount = xps_queues which almost all drivers default > configurations goes this way, it works like charm, xps selects the > exact TC TX queue at the correct offset without any need for further > SKB hashing. > and even if by mistake XPS was also configured on TC TX queue then > this patch will detect that the xps hash is out of this TC > offset/qcount range and will re-hash. But i don't see why would user > or driver do such strange configuration. > >> I expect it to. I think the question is are people OK with >> letting skb->queue_mapping take precedence. I am at least >> because it makes the skb edit queue_mapping action from tc >> easier to use. >> > > skb->queue_mapping toke precedence also before this patch, the only > thing this patch came to change is how to compute the txq when > skb->queue_mapping is not present, so we don't need to worry about > this. >
I don't believe that is correct in the general case. Perhaps in the ndo_select_queue path though. See this line, if (queue_index < 0 || skb->ooo_okay || queue_index >= dev->real_num_tx_queues) { int new_index = get_xps_queue(dev, skb); if (new_index < 0) new_index = skb_tx_hash(dev, skb); The skb_tx_hash() routine is never called if xps is enabled. And so we never get into the call to do this, if (skb_rx_queue_recorded(skb)) { hash = skb_get_rx_queue(skb); while (unlikely(hash >= num_tx_queues)) hash -= num_tx_queues; return hash; } Right? FWIW I think that using queue_mapping before xps is better because we can use tc to pick the queue_mapping them programmatically if we want for these special cases instead if wanted. >> And just a comment on the code why not just move get_xps_queue >> into skb_tx_hash at this point if its always being called as the >> "hint". Then we avoid calling it in the case queue_mapping is >> set. >> > > Very good point, the only place that calls skb_tx_hash(dev, skb) other > than __netdev_pick_tx is mlx4 driver and they did it there just > because they wanted to bypass XPS configuration if TC QoS is > configured, with this fix we don't have to bypass XPS at all for when > TC is configured. > > I will change it. > Great thanks.