You mean like a COW filesystem with end-to-end checksums were you can send snapshots and don't care to much about MD5?
I looked it up. Spectrum Scale fka. GPFS has end-to-end checksums, (global) snapshots and mmapplypolicy to get the list of files to backup -- at least Commvault according to their documentation is leveraging it to get the changed files. Know I wonder were that "theory" doesn't match practice... Over and out. On 17.06.19 16:39, Michael Di Domenico wrote: > rsync on 10PB sounds painful. i haven't used GPFS in a very long > time, so i might have a gap in knowledge. but i would be surprised if > GPFS doesn't have a changelog, where you can watch the files that > changed through the day and only copy the ones that did? much like > what robinhood does for lustre. > > On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <b...@princeton.edu> wrote: >> >> We have moved to a rsync disk backup system, from TSM tape, in order to >> have a DR for our 10 PB GPFS filesystem. We looked at a lot of options >> but here we are. >> >> md5 checksums take a lot of compute time with huge files and even with >> millions of smaller ones. The bulk of the time for running rsync is >> spent in computing the source and destination checksums and we'd like to >> alleviate that pain of a cryptographic algorithm. >> >> Googling around, I found no mention of using a technique like this to >> improve rsync performance. I did find reference to a few hashing >> algorithms though which could certainly work here (xxhash, murmurhash, >> sbox, cityhash64). >> >> Rsync has certainly been around for a few years! We are going to pursue >> changing the current checksum algorithm and using something much faster. >> If anyone has done this already and would like to share their >> experiences that would be wonderful. Ideally this could be some optional >> plugin for rsync where users could choose which checksummer to use. >> >> Bill >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- FSU Jena | JULIELab.de/Staff/Redling ☎ +49 3641 9 44323 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf