On 6/17/19 11:12 AM, b...@princeton.edu wrote:
It's not a GPFS issue per se. The changelog isn't quite there right now
but will be. Today the question only is about rsync performance.
Hi Bill,
md5 is a reasonably efficient checksum. When you start getting into
SHA-128 and -256 is when things start to get a little more expensive. I
would be really surprised if you were CPU-bound rather than I/O-bound
for this.
By default rsync does not operate using checksums, and therefore does
not need to read in each file in its entirety to see if it should be
updated. Do you have a strong reason for using the --checksum option?
Users typically have to try pretty hard to do things that circumvent the
heuristic rsync uses by default.
If you need guaranteed DR even in the face of a file that falls outside
of the typical rsync heuristic, you're best served by leveraging some
part of the underlying filesystems feature set to achieve this. It's
the only one that's going to be able to trivially compute what changed
and track that. In my day job at Panasas we designed pan_snap_delta
explicitly for this -- to be able to efficiently emit a succinct list of
files and directories which have changed in any way between snapshots,
and our customers have used that paired with the rsync --files-from
option to great effect. Two summers back I added another utility to
that mix, pan_snap_replicator, which could figure out exactly how a file
or folder had changed, which ends up being crucial for situations like
"we moved our 1PB directory of stuff from /a to /b." rsync regularly
will cope with this via deleting the entire dir on the remote side and
copying it over the wire, which is clearly undesirable.
Outside of leveraging filesystem-specific features like that, which GPFS
may or may not offer, I don't have any better suggestions for you. But
I do suspect you're I/O-bound here and md5 itself is not the problem.
Best,
ellis
--
Ellis H. Wilson III, Ph.D.
www.ellisv3.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf