Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Stu Midgley
I use xxhash https://github.com/Cyan4973/xxHash to do hashes... much faster. On Tue, Jun 18, 2019 at 7:18 AM Benjamin Redling wrote: > You mean like a COW filesystem with end-to-end checksums were you can > send snapshots and don't care to much about MD5? > > I looked it up. Spectrum Scale fka.

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Benjamin Redling
You mean like a COW filesystem with end-to-end checksums were you can send snapshots and don't care to much about MD5? I looked it up. Spectrum Scale fka. GPFS has end-to-end checksums, (global) snapshots and mmapplypolicy to get the list of files to backup -- at least Commvault according to their

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Ellis H. Wilson III
On 6/17/19 1:35 PM, pellman.j...@gmail.com wrote: I know that at one point, some Intel chips had instruction extensions available to speed up SHA checksums by computing them directly in hardware.  Might be worth looking into: https://software.intel.com/en-us/articles/intel-sha-extensions On m

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread pellman . john
I know that at one point, some Intel chips had instruction extensions available to speed up SHA checksums by computing them directly in hardware. Might be worth looking into: https://software.intel.com/en-us/articles/intel-sha-extensions More recently, Intel has been promoting QuickAssist/QAT, wh

[Beowulf] Call For Papers: HPCSYSPROS Workshop @ SC19 Friday, November 22nd

2019-06-17 Thread John
HPC Systems Professionals Workshop (HPCSYSPROS19) Call For Papers, Artifacts, and Lightning Talks --- HPCSYSPROS19 is held in conjunction with SC19: The International Conference on High Performance Computing, Networking, Storage and Analysis. http://sighpc-syspros.org/workshops/2019/

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Loncaric, Josip via Beowulf
Why not use existing pftool? https://github.com/pftool/pftool -Josip On 6/17/19 10:07 AM, Michael Di Domenico wrote: just out of morbid curiosity i popped through the rsync code. it doesn't look terribly difficult to wedge in a new algo. but honestly, if i was going to go through the trouble

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Michael Di Domenico
just out of morbid curiosity i popped through the rsync code. it doesn't look terribly difficult to wedge in a new algo. but honestly, if i was going to go through the trouble i'd write a new tool that walks the file tree in parallel and logs the checksums to a database. i've had problems rsync'i

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Christopher Samuel
On 6/17/19 6:43 AM, Bill Wichser wrote: md5 checksums take a lot of compute time with huge files and even with millions of smaller ones.  The bulk of the time for running rsync is spent in computing the source and destination checksums and we'd like to alleviate that pain of a cryptographic al

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Ellis H. Wilson III
On 6/17/19 11:12 AM, b...@princeton.edu wrote: It's not a GPFS issue per se.  The changelog isn't quite there right now but will be.  Today the question only is about rsync performance. Hi Bill, md5 is a reasonably efficient checksum. When you start getting into SHA-128 and -256 is when thin

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread bill
It's not a GPFS issue per se.  The changelog isn't quite there right now but will be.  Today the question only is about rsync performance.Thanks,BillOn Jun 17, 2019 11:04 AM, John Hearns via Beowulf wrote:Probably best asking this question over on the GPFS mailing list.A bit of Googling reminded m

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread John Hearns via Beowulf
Probably best asking this question over on the GPFS mailing list. A bit of Googling reminded me of https://www.arcastream.com/ They are active in the UK Academic community, not sure about your neck of the woods. Give them a shout though and ask for Steve Mackie. http://arcastream.com/what-we-do/

Re: [Beowulf] Rsync - checksums

2019-06-17 Thread Michael Di Domenico
rsync on 10PB sounds painful. i haven't used GPFS in a very long time, so i might have a gap in knowledge. but i would be surprised if GPFS doesn't have a changelog, where you can watch the files that changed through the day and only copy the ones that did? much like what robinhood does for lust

[Beowulf] Rsync - checksums

2019-06-17 Thread Bill Wichser
We have moved to a rsync disk backup system, from TSM tape, in order to have a DR for our 10 PB GPFS filesystem. We looked at a lot of options but here we are. md5 checksums take a lot of compute time with huge files and even with millions of smaller ones. The bulk of the time for running rs