It's not a GPFS issue per se. The changelog isn't quite there right now but will be. Today the question only is about rsync performance.
Thanks,
Bill
On Jun 17, 2019 11:04 AM, John Hearns via Beowulf <beowulf@beowulf.org> wrote:
Probably best asking this question over on the GPFS mailing list.A bit of Googling reminded me of https://www.arcastream.com/ They are active in the UK Academic community,not sure about your neck of the woods.Give them a shout though and ask for Steve Mackie. http://arcastream.com/what-we-do/On Mon, 17 Jun 2019 at 15:39, Michael Di Domenico <mdidomenico4@gmail.com> wrote:rsync on 10PB sounds painful. i haven't used GPFS in a very long
time, so i might have a gap in knowledge. but i would be surprised if
GPFS doesn't have a changelog, where you can watch the files that
changed through the day and only copy the ones that did? much like
what robinhood does for lustre.
On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <bill@princeton.edu> wrote:
>
> We have moved to a rsync disk backup system, from TSM tape, in order to
> have a DR for our 10 PB GPFS filesystem. We looked at a lot of options
> but here we are.
>
> md5 checksums take a lot of compute time with huge files and even with
> millions of smaller ones. The bulk of the time for running rsync is
> spent in computing the source and destination checksums and we'd like to
> alleviate that pain of a cryptographic algorithm.
>
> Googling around, I found no mention of using a technique like this to
> improve rsync performance. I did find reference to a few hashing
> algorithms though which could certainly work here (xxhash, murmurhash,
> sbox, cityhash64).
>
> Rsync has certainly been around for a few years! We are going to pursue
> changing the current checksum algorithm and using something much faster.
> If anyone has done this already and would like to share their
> experiences that would be wonderful. Ideally this could be some optional
> plugin for rsync where users could choose which checksummer to use.
>
> Bill
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf