Hi all: I'd also like to point you guys to pcp:
http://www.theether.org/pcp/ It's a bit old, but should still build on modern systems. It would be nice if somebody picks up development after all these years (hint hint) :-) Cheers, Bernard On Mon, Jun 11, 2012 at 11:10 AM, Joe Landman <land...@scalableinformatics.com> wrote: > On 06/11/2012 02:02 PM, Jesse Becker wrote: > >> I looked into doing something like this on 50-node cluster to >> synchronize several hundred GB of semi-static data used in /scratch. >> I found that the time to build the torrent files--calculating checksums >> and such--was *far* more time consuming than the actual file >> distribution. This is on top of the rather severe IO hit on the "seed" >> box as well. >> > > A long while ago, we developed 'xcp' which did data distribution from 1 > machine to many machines, and was quite fast (non-broadcast). > Specifically for moving some genomic/proteomic databases to remote > nodes. Didn't see much interest in it, so we shelved it. It worked > like this > > xcp file remote_path [--nodes node1[,node2....]] [--all] > > We were working on generalizing it for directories and other things as > well, but as I noted, people were starting to talk (breathlessly at the > time) about torrents for distribution, so we pushed it off and forgot > about it. > >> I fought with it for a while, but came to the conclusion that *for >> _this_ data*, and how quickly it changed, torrents weren't the way to >> go--largely because of the cost of creating the torrent in the first >> place. >> >> However, I do think that similar systems could be very useful, if >> perhaps a bit less strict in their tests. The peer-to-peer model is >> uselful, and (in some cases) simple size/date check could be enough to >> determine when (re)copying a file. >> >> One thing torrent's don't handle are file deletions, which opens up a >> few new problems. >> >> Eventually, I moved to a distrbuted rsync tree, which worked for a >> while, but was slightly fragile. Eventually, we dropped the whole >> thing when we purchased a sufficiently fast storage system. > > This is one of the things that drove us to building fast storage > systems. Data motion is hard, and a good fast storage unit with some > serious data movement cannons and high power storage can solve the > problem with greater ease/elegance. > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: land...@scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf