On 6/17/19 11:12 AM, b...@princeton.edu wrote:
It's not a GPFS issue per se.  The changelog isn't quite there right now but will be.  Today the question only is about rsync performance.

Hi Bill,

md5 is a reasonably efficient checksum. When you start getting into SHA-128 and -256 is when things start to get a little more expensive. I would be really surprised if you were CPU-bound rather than I/O-bound for this.

By default rsync does not operate using checksums, and therefore does not need to read in each file in its entirety to see if it should be updated. Do you have a strong reason for using the --checksum option? Users typically have to try pretty hard to do things that circumvent the heuristic rsync uses by default.

If you need guaranteed DR even in the face of a file that falls outside of the typical rsync heuristic, you're best served by leveraging some part of the underlying filesystems feature set to achieve this. It's the only one that's going to be able to trivially compute what changed and track that. In my day job at Panasas we designed pan_snap_delta explicitly for this -- to be able to efficiently emit a succinct list of files and directories which have changed in any way between snapshots, and our customers have used that paired with the rsync --files-from option to great effect. Two summers back I added another utility to that mix, pan_snap_replicator, which could figure out exactly how a file or folder had changed, which ends up being crucial for situations like "we moved our 1PB directory of stuff from /a to /b." rsync regularly will cope with this via deleting the entire dir on the remote side and copying it over the wire, which is clearly undesirable.

Outside of leveraging filesystem-specific features like that, which GPFS may or may not offer, I don't have any better suggestions for you. But I do suspect you're I/O-bound here and md5 itself is not the problem.

Best,

ellis

--
Ellis H. Wilson III, Ph.D.
     www.ellisv3.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to