I use xxhash https://github.com/Cyan4973/xxHash to do hashes... much faster.
On Tue, Jun 18, 2019 at 7:18 AM Benjamin Redling <benjamin.ra...@uni-jena.de> wrote: > You mean like a COW filesystem with end-to-end checksums were you can > send snapshots and don't care to much about MD5? > > I looked it up. Spectrum Scale fka. GPFS has end-to-end checksums, > (global) snapshots and mmapplypolicy to get the list of files to backup > -- at least Commvault according to their documentation is leveraging it > to get the changed files. > > Know I wonder were that "theory" doesn't match practice... > > Over and out. > > On 17.06.19 16:39, Michael Di Domenico wrote: > > rsync on 10PB sounds painful. i haven't used GPFS in a very long > > time, so i might have a gap in knowledge. but i would be surprised if > > GPFS doesn't have a changelog, where you can watch the files that > > changed through the day and only copy the ones that did? much like > > what robinhood does for lustre. > > > > On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <b...@princeton.edu> wrote: > >> > >> We have moved to a rsync disk backup system, from TSM tape, in order to > >> have a DR for our 10 PB GPFS filesystem. We looked at a lot of options > >> but here we are. > >> > >> md5 checksums take a lot of compute time with huge files and even with > >> millions of smaller ones. The bulk of the time for running rsync is > >> spent in computing the source and destination checksums and we'd like to > >> alleviate that pain of a cryptographic algorithm. > >> > >> Googling around, I found no mention of using a technique like this to > >> improve rsync performance. I did find reference to a few hashing > >> algorithms though which could certainly work here (xxhash, murmurhash, > >> sbox, cityhash64). > >> > >> Rsync has certainly been around for a few years! We are going to pursue > >> changing the current checksum algorithm and using something much faster. > >> If anyone has done this already and would like to share their > >> experiences that would be wonderful. Ideally this could be some optional > >> plugin for rsync where users could choose which checksummer to use. > >> > >> Bill > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > >> To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > > > -- > FSU Jena | JULIELab.de/Staff/Redling > ☎ +49 3641 9 44323 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Dr Stuart Midgley sdm...@gmail.com
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf