I was going back to Alexei's comment here: http://permalink.gmane.org/gmane.linux.network/387806 In rds-stress profiling, we are indeed seeing the pksb_pull (from rds_tcp_data_recv) show up in the perf profile.
At least for rds-tcp, we cannot re-use the skb even if it is not shared, because what we need to do is to carve out the interesting bits (start at offset, and trim it to to_copy) and queue up those interesting bits on the PF_RDS socket, while allowing tcp_data_read to go back and read the next tcp segment (which may be part of the next rds dgram). But, when pskb_expand_head is invoked in the call-stack pskb_pull(.., offset) -> ... -> __pskb_pull_tail(.., delta) it will memcpy the offset bytes to the start of data. At least for the rds_tcp_data_recv, we are not interested in being able to do a *skb_push after the *skb_pull, so we dont really care about the intactness of these bytes in offset. Thus what I am finding is that when delta > 0, if we skip the following in pskb_expand_head (for the rds-tcp recv case only!) /* Copy only real data... and, alas, header. This should be * optimized for the cases when header is void. */ memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head); and also (only for this case!) this one in __pskb_pull_tail if (skb_copy_bits(skb, skb_headlen(skb), skb_tail_pointer(skb), delta)) BUG(); I am able to get a 40% improvement in the measured IOPS (req/response transactions per second, using 8K byte requests, 256 byte responses, 16 concurrent threads), so this optimization seems worth doing. Does my analysis above make sense? If yes, the question is, how to achieve this bypass in a neat way. Clearly there are many callers of pskb_expand_head who will expect to find the skb_header_len bytes at the start of data, but we also dont want to duplicate code in these functions. One thought I had is to pass a flag arouund saying "caller doesnt care about retaining offset bytes", and use this flag - in __pskb_pull_tail, to avoid skb_copy_bits() above, and to pass delta to pskb_expand_head, - in pskb_expand_head, only do the memcpy listed above if delta <= 0 Any other ideas for how to achieve this? --Sowmini