Just wanted to circle back on my orginal question. I changed the rsync code adding xxhash and we see about a 3x speedup. Good enough since it is very close to not using any checksum speedups.

Bill

On 6/17/19 9:43 AM, Bill Wichser wrote:
We have moved to a rsync disk backup system, from TSM tape, in order to have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options but here we are.

md5 checksums take a lot of compute time with huge files and even with millions of smaller ones.  The bulk of the time for running rsync is spent in computing the source and destination checksums and we'd like to alleviate that pain of a cryptographic algorithm.

Googling around, I found no mention of using a technique like this to improve rsync performance.  I did find reference to a few hashing algorithms though which could certainly work here (xxhash, murmurhash, sbox, cityhash64).

Rsync has certainly been around for a few years!  We are going to pursue changing the current checksum algorithm and using something much faster.  If anyone has done this already and would like to share their experiences that would be wonderful. Ideally this could be some optional plugin for rsync where users could choose which checksummer to use.

Bill
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to