Well thanks for THAT pointer! Using --checksum-choice=none results in speedup of somewhere between 2-3 times. That's my validation of the checksum theory things have been pointing towards. Now to get xxhash into rsync and I think we are all set.

Thanks,
Bill

On 6/18/19 9:57 AM, Ellis H. Wilson III wrote:
On 6/18/19 9:16 AM, Bill Wichser wrote:
Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64.  We've tried a number of recompiles.  gcc, Intel.  The only thing between identical compiles was the md4 vs md5.

/bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete --delete-after --files-from=...

I'm not asking for help.  Just if anyone had attempted to change the algorithm into something much faster.

I refer you to this project https://cyan4973.github.io/xxHash/ where there is a table of speeds.  Regardless of what anyone might speculate, we are pursuing this route of changing out the algorithm. Maybe it's all for naught.  Maybe it isn't.  But in a few weeks hopefully we'll have determined.

Very interesting.  From the rsync man page:

"Note that rsync always verifies that each transferred file was correctly reconstructed  on  the  receiving  side  by checking  a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check."

So it sounds like you have sufficient churn in large files that the checksum validation post-transfer is your bottleneck.  Short of hacking rsync to use a faster algorithm, your remaining choice is to use the --checksum-choice=STR and set it to none, and then perform your own hashing out-of-band to check the transferred data using the list you have provided via in files-from.  This will nerf rsync's ability to do delta-transfer, which may be ok depending on the nature of your churning files.  If your pipes are huge (atypical for DR), your CPU is weak, and your churning data is mostly completely new or completely changed files, --checksum-choice=none may work very well for you.

Best,

ellis

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to