Probably best asking this question over on the GPFS mailing list. A bit of Googling reminded me of https://www.arcastream.com/ They are active in the UK Academic community, not sure about your neck of the woods. Give them a shout though and ask for Steve Mackie. http://arcastream.com/what-we-do/
On Mon, 17 Jun 2019 at 15:39, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > rsync on 10PB sounds painful. i haven't used GPFS in a very long > time, so i might have a gap in knowledge. but i would be surprised if > GPFS doesn't have a changelog, where you can watch the files that > changed through the day and only copy the ones that did? much like > what robinhood does for lustre. > > On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <b...@princeton.edu> wrote: > > > > We have moved to a rsync disk backup system, from TSM tape, in order to > > have a DR for our 10 PB GPFS filesystem. We looked at a lot of options > > but here we are. > > > > md5 checksums take a lot of compute time with huge files and even with > > millions of smaller ones. The bulk of the time for running rsync is > > spent in computing the source and destination checksums and we'd like to > > alleviate that pain of a cryptographic algorithm. > > > > Googling around, I found no mention of using a technique like this to > > improve rsync performance. I did find reference to a few hashing > > algorithms though which could certainly work here (xxhash, murmurhash, > > sbox, cityhash64). > > > > Rsync has certainly been around for a few years! We are going to pursue > > changing the current checksum algorithm and using something much faster. > > If anyone has done this already and would like to share their > > experiences that would be wonderful. Ideally this could be some optional > > plugin for rsync where users could choose which checksummer to use. > > > > Bill > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf