On 04.01.2016 00:22, Tom Herbert wrote:
Implement assembly routine for csum_partial for 64 bit x86. This primarily speeds up checksum calculation for smaller lengths such as those that are present when doing skb_postpull_rcsum when getting CHECKSUM_COMPLETE from device or after CHECKSUM_UNNECESSARY conversion.This implementation is similar to csum_partial implemented in checksum_32.S, however since we are dealing with 8 bytes at a time there are more cases for alignment and small lengths-- for those we employ jump tables. Testing: Verified correctness by testing arbitrary length buffer filled with random data. For each buffer I compared the computed checksum using the original algorithm for each possible alignment (0-7 bytes). Checksum performance: Isolating old and new implementation for some common cases: Old New Case nsecs nsecs Improvement ---------------------+--------+--------+----------------------------- 1400 bytes (0 align) 194.4 176.7 9% (Big packet) 40 bytes (0 align) 10.5 5.7 45% (Ipv6 hdr common case) 8 bytes (4 align) 8.6 7.4 15% (UDP, VXLAN in IPv4) 14 bytes (0 align) 10.4 6.5 37% (Eth hdr) 14 bytes (4 align) 10.8 7.8 27% (Eth hdr in IPv4) Signed-off-by: Tom Herbert <[email protected]>
I verified the implementation through tests and can also see a speed-up in almost all cases. Unfortunately _addcarry_u64 intrinsics and __int128 for letting the compiler use adc instructions generated even worse code as the current implementation.
Acked-by: Hannes Frederic Sowa <[email protected]> Thanks Tom! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
