On 03/19/2014 02:59 PM, Joshua Ulrich wrote:
On Wed, Mar 19, 2014 at 4:28 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu> wrote:
On Wed, Mar 19, 2014 at 11:50 AM, Joshua Ulrich <josh.m.ulr...@gmail.com>
wrote:
The suggested solution is not described in the referenced article. It
was not suggested that it be the operating system's responsibility to
distribute snapshots, nor was it suggested to create binary
repositories for specific operating systems, nor was it suggested to
freeze only a subset of CRAN packages.
IMO this is an implementation detail. If we could all agree on a particular
set of cran packages to be used with a certain release of R, then it doesn't
matter how the 'snapshotting' gets implemented. It could be a separate
repository, or a directory on cran with symbolic links, or a page somewhere
with hyperlinks to the respective source packages. Or you can put all
packages in a big zip file, or include it in your OS distribution. You can
even distribute your entire repo on cdroms (debian style!) or do all of the
above.
The hard problem is not implementation. The hard part is that for
reproducibility to work, we need community wide conventions on which
versions of cran packages are used by a particular release of R. Local
downstream solutions are impractical, because this results in
scripts/packages that only work within your niche using this particular
snapshot. I expect that requiring every script be executed in the context of
dependencies from some particular third party repository will make
reproducibility even less common. Therefore I am trying to make a case for a
solution that would naturally improve reliability/reproducibility of R code
without any effort by the end-user.
So implementation isn't a problem. The problem is that you need a way
to force people not to be able to use different package versions than
what existed at the time of each R release. I said this in my
previous email, but you removed and did not address it: "However, you
would need to find a way to actively _prevent_ people from installing
newer versions of packages with the stable R releases." Frankly, I
would stop using CRAN if this policy were adopted.
I suggest you go build this yourself. You have all the code available
on CRAN, and the dates at which each package was published. If others
who care about reproducible research find what you've built useful,
you will create the very community you want. And you won't have to
force one single person to change their workflow.
Yeah we've already heard this "do it yourself" kind of answer. Not a
very productive one honestly.
Well actually that's what we've done for the Bioconductor repositories:
we freeze the BioC packages for each version of Bioconductor. But since
this freezing doesn't happen at the CRAN level, and many BioC packages
depend on CRAN packages, the freezing is only at the surface. Would be
much better if the freezing was all the way down to the bottom of the
sea. (Note that it is already if you install binary packages only.)
Yes it's technically possible to work around this by also hosting
frozen versions of CRAN, one per version of Bioconductor, and have
biocLite() (the tool BioC users use for installing packages) point to
these frozen versions of CRAN in order to get the correct dependencies
for any given version of BioC. However we don't do that because that
would mean extra costs for us in terms of storage space and bandwidth.
And also because we believe that it would be more effective and would
ultimately benefit the entire R community (and not just the BioC
community) if this problem was addressed upstream.
H.
Best,
--
Joshua Ulrich | about.me/joshuaulrich
FOSS Trading | www.fosstrading.com
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel