> Thinking about this more - why does this patch help some benchmarks? > The amount of work it takes for the hardware to generate a completion > is likely negligeable, and we still are scanning the same amount > of TX WRs in a loop to unmap/free them. This makes sense but I think you should also consider the fact that the tx_lock is taken once per per tx_completion so, with the patch, the driver spends less time under lock.
> If you think about it this way, it becomes clear that your workload, > for some reason, hits a path where you get an event very fast > after the first completion and there is only a small number of completions > to handle. So your patch helps just by delaying the event handler until > there's more work to do. And I expect it wouldn't help TCP much if at all > as there are RX WRs per each couple of TX WRs. > This is a good point to check. I hope I can get to it and spend time over it next week. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
