Hi, I am sorry to reincarnate the thread but I just wanted a simple clarification and give also few cents of my thoughts.
> * it's better to have stuff distributed by Debian than sourced > elsewhere; we're a distribution, distributing is What We Do > * it's better for users to have stuff in .deb's, so they don't > have to worry about different ways of managing different stuff > on their system > * some large data sets are just "compiled" -- it can be good to > distribute a small amount of source in a .deb and compile > it on the user's machine. 2nd-ed all 3 points > * some large data sets are "compiled" but it takes long enough that > we don't want to do it on user's machines, so we have the usual > source/deb situation here, and that's fairly easy too. 2nd-ed: just how subjective here is time/space tradeoff? > * (***) many data sets don't fit those patterns though, but > >...< > * (###) having .deb's generated on a user's system means they > >...< > to be mirrored separately; having .deb's be the source format > requires converting from the upstream source format adding > complexity and making it harder to trace how the packaging > worked Since most of the time we (programmers ;-)) hate to do things manually, such repackaging should be automated anyways. I can say that quite a few people make use of dh_wraporig which was devised for the cases when source package had to be repackaged before entering debian: http://lists.debian.org/debian-mentors/2007/03/msg00268.html So I see possibility and desire in a similar tool (or may be just further development of dh_wraporig), so it could handle automatic repackaging of the datasets. > I guess an evil solution to *** that doesn't cause problems with ### > would be to create a dummy source package that Build-Depends: on the > exact version of the package it builds, so that uploads include a > >...< here is where I got stuck with such approach: conventionally I just dgetted sources and tried to build the package with dpkg-buildpackage. Of cause I failed to accomplish the mission since Build-Depends weren't satisfied... so indeed it seems to be confusing or my brain is not working now... My suggestion (I might be duplicating someone else' idea, please pardon me) -- for arch 'all': automatically download (copy) data during build of the package. I know that somewhere now I will hit the roof in debian policy, but for whatever it is worth. automate building of the package so that smth like dh_wraporig downloads (on_demand via debian/watch mechanism) the original dataset, debian/rules prepares data and stuffs .deb packages with the necessary data, so that .orig.tar.gz doesn't contain any data, and diff contains all the scripts/instructions/verification (md5sum in debian/README.Debian-source). I don't think this would require Build-depends on the data packages. Since _all packages need not be rebuild for each architecture, no buildd box would have a problem, and anyone having internet access will be able to rebuild the package if that needs to be done. > I'm not sure if avoiding duplicating the data (1G of data is bad, but > 1G of the same data in a .orig.tar.gz _and_ a .deb is absurd) is enough > to just use the existing archive and mirror network, or if it'd still be > worth setting up a separate apt-able archive under debian.org somewhere > for _really_ big data. 2nd-ed. At first I thought about adding another major (to complement main,contrib,non-free) but that ruins orthogonality since we would need main-data,contrib-data... So, it was a bad idea. Thus, separate apt rep like data.debian.org with fine-grained sections (science/med, science/bio, etc) would allow easy and selective (debmirror --exclude-deb-section=regex should be complemented with --deb-section=regex, to make selection easier than exclusion of everything besides necessary sections) mirroring. I think that any research group using Debian should have their own debian mirror anyways ;) and now just 1 more mirror for data specific to their needs, but from the global debian mirror. -- Yaroslav Halchenko Research Assistant, Psychology Department, Rutgers-Newark Student Ph.D. @ CS Dept. NJIT Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102 WWW: http://www.linkedin.com/in/yarik -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]