Troy Benjegerdes <ho...@hozed.org> writes: > On Fri, Jul 17, 2015 at 09:38:06PM +0200, Jakub Wilk wrote: >> * Ole Streicher <oleb...@debian.org>, 2015-07-17, 10:34: >> >But: These packages sum up to ~25 GB, with the maximal package >> >size of 3.5 GB. >> >> Well, that's a lot. Just as data points: >> >> * The biggest binary package currently in the archive, >> ns3-doc_3.17+dfsg-1_all.deb, is only ~1GB. >> >> * The biggest source package, nvidia-cuda-toolkit_6.0.37-5, is only >> ~1.5GB. >> >> >> I'm afraid you might need to wait for the advent of data.d.o: >> https://lists.debian.org/87tzgm6yee....@vorlon.ganneff.de >> (mind the typo: s/2 weeks/10 years/) >> > > My first thought was "well, can all of us science-type users > agree to host something like /afs/data.d.o/", and then I saw > the following: > > On Fri, Jul 17, 2015 at 02:03:54AM -0700, Afif Elghraoui wrote: >> Package: wnpp >> Severity: wishlist >> Owner: Afif Elghraoui <a...@ghraoui.name> >> X-Debbugs-Cc: debian-devel@lists.debian.org >> >> * Package name : ori >> Version : 0.8.1 >> Upstream Author : Stanford University <orifs-de...@lists.stanford.edu> >> * URL : http://ori.scs.stanford.edu/ >> * License : ori (MIT-like) >> Programming Lang: C++ >> Description : secure distributed file system >> >> Ori is a distributed file system built for offline operation and empowers >> the user with control over synchronization operations and conflict >> resolution. >> History is provided through lightweight snapshots and users can verify that >> the history has not been tampered with. Through the use of replication, >> instances can be resilient and recover damaged data from other nodes. > > So is there any sort of reasonable internet-scale distributed > filesystem in use that might actually work for this?
Git-annex supports Tahoe-LAFS: https://git-annex.branchable.com/special_remotes/tahoe/ but given that it also supports all of these: https://git-annex.branchable.com/special_remotes/ I'd guess that the data would quite often reside on resources that are at least as reliable as whatever we might set up, so one could just do it on a case by case basis. git-annex allows one to set the number of copies that one wants to exist of the data, so one could perhaps insist that data have multiple sources, and that could be checked periodically, with some plan to copy data elsewhere if and when a source disappears. The users of the data could be given the option to contribute to the checking process, so that it gets done as part of the act of using the data. Any effort required to shift data to new resources when old sources disappear could be done by those that benefit from the access to the data, in a distributed manner. Cheers, Phil. -- |)| Philip Hands [+44 (0)20 8530 9560] HANDS.COM Ltd. |-| http://www.hands.com/ http://ftp.uk.debian.org/ |(| Hugo-Klemm-Strasse 34, 21075 Hamburg, GERMANY
signature.asc
Description: PGP signature