if you get it working, I'd be interested in it :) I personally like the way tar does it, where you can provide your own "compression" which I've used to insert checksums into the stream.
On Tue, Jun 18, 2019 at 11:03 PM Stu Midgley <sdm...@gmail.com> wrote: > Are you rsyncing over ssh? If so, get HPN-SSH and use the non-cipher. > MUCH faster again :) > > On Tue, Jun 18, 2019 at 11:00 PM Bill Wichser <b...@princeton.edu> wrote: > >> Well thanks for THAT pointer! Using --checksum-choice=none results in >> speedup of somewhere between 2-3 times. That's my validation of the >> checksum theory things have been pointing towards. Now to get xxhash >> into rsync and I think we are all set. >> >> Thanks, >> Bill >> >> On 6/18/19 9:57 AM, Ellis H. Wilson III wrote: >> > On 6/18/19 9:16 AM, Bill Wichser wrote: >> >> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64. We've tried a >> >> number of recompiles. gcc, Intel. The only thing between identical >> >> compiles was the md4 vs md5. >> >> >> >> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete >> >> --delete-after --files-from=... >> >> >> >> I'm not asking for help. Just if anyone had attempted to change the >> >> algorithm into something much faster. >> >> >> >> I refer you to this project https://cyan4973.github.io/xxHash/ where >> >> there is a table of speeds. Regardless of what anyone might >> >> speculate, we are pursuing this route of changing out the algorithm. >> >> Maybe it's all for naught. Maybe it isn't. But in a few weeks >> >> hopefully we'll have determined. >> > >> > Very interesting. From the rsync man page: >> > >> > "Note that rsync always verifies that each transferred file was >> > correctly reconstructed on the receiving side by checking a >> > whole-file checksum that is generated as the file is transferred, but >> > that automatic after-the-transfer verification has nothing to do with >> > this option’s before-the-transfer "Does this file need to be updated?" >> > check." >> > >> > So it sounds like you have sufficient churn in large files that the >> > checksum validation post-transfer is your bottleneck. Short of hacking >> > rsync to use a faster algorithm, your remaining choice is to use the >> > --checksum-choice=STR and set it to none, and then perform your own >> > hashing out-of-band to check the transferred data using the list you >> > have provided via in files-from. This will nerf rsync's ability to do >> > delta-transfer, which may be ok depending on the nature of your >> churning >> > files. If your pipes are huge (atypical for DR), your CPU is weak, and >> > your churning data is mostly completely new or completely changed >> files, >> > --checksum-choice=none may work very well for you. >> > >> > Best, >> > >> > ellis >> > >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> > > > -- > Dr Stuart Midgley > sdm...@gmail.com > -- Dr Stuart Midgley sdm...@gmail.com
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf