I know that at one point, some Intel chips had instruction extensions available to speed up SHA checksums by computing them directly in hardware. Might be worth looking into: https://software.intel.com/en-us/articles/intel-sha-extensions
More recently, Intel has been promoting QuickAssist/QAT, which also seems to perform hardware acceleration for SHA algorithms (seems like a possible re-branding / architecture recycling). There's some integration with ZFS for this. https://drive.google.com/file/d/0B_J4mRfoVJQRV3ZOd1ZMWkphcV9OYXdWT0FBblVHbVZpSmZj/view SPARC has also had a large number of cipher algorithms hardwired into its architecture recently (what Oracle is calling "Software in Silicon"). See here <http://storageconference.us/2017/Presentations/Phillips.pdf>. Of course, to take advantage of this technology you'd have to deal with Oracle, as well as an increasingly uncommon CPU architecture. On Mon, Jun 17, 2019 at 12:34 PM Loncaric, Josip via Beowulf < beowulf@beowulf.org> wrote: > Why not use existing pftool? > > https://github.com/pftool/pftool > > -Josip > > On 6/17/19 10:07 AM, Michael Di Domenico wrote: > > just out of morbid curiosity i popped through the rsync code. it > > doesn't look terribly difficult to wedge in a new algo. but honestly, > > if i was going to go through the trouble i'd write a new tool that > > walks the file tree in parallel and logs the checksums to a database. > > i've had problems rsync'ing big filesystems in the past, so i try to > > avoid it as a DR or poor-man's snapshotting > > > > On Mon, Jun 17, 2019 at 11:30 AM Christopher Samuel <ch...@csamuel.org> > wrote: > >> On 6/17/19 6:43 AM, Bill Wichser wrote: > >> > >>> md5 checksums take a lot of compute time with huge files and even with > >>> millions of smaller ones. The bulk of the time for running rsync is > >>> spent in computing the source and destination checksums and we'd like > to > >>> alleviate that pain of a cryptographic algorithm. > >> First of all I would note that rsync only uses checksums if you tell it > >> to, otherwise it just uses file times and sizes to determine what to > >> transfer. > >> > >> rsync is also single-threaded, so I would take a look at what was > >> previously called parsync, but is now parsynfp :-) > >> > >> http://moo.nac.uci.edu/~hjm/parsync/ > >> > >> There is the caveat there though: > >> > >> # As a warning, the main use case for parsyncfp is really only > >> # very large data transfers thru fairly fast network connections > >> # (>1Gb). Below this speed, rsync itself can saturate the > >> # connection, so there’s little reason to use parsyncfp and in > >> # fact the overhead of testing the existence of and starting more > >> # rsyncs tends to worsen its performance on small transfers to > >> # slightly less than rsync alone. > >> > >> Good luck! > >> Chris > >> -- > >> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > >> To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > -- > Dr. Josip Loncaric, LANL, MS-T001, P.O. Box 1663, Los Alamos, NM 87545 > mailto:jo...@lanl.gov Cell: +1-505-412-8490 Phone: +1-505-412-6538 > -- > E Pluribus Unum > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf