Just my 2 cents: it may not be a good idea to restrict software versions to gain reproducibility. To me, this kind of reproducibility is "dead" reproducibility (what if the old software has a fatal bug? do we want to reproduce the same **wrong** results?). Software packages are continuously evolving, and our research should be adapted as well. How to achieve this? I think this paper by Robert Gentleman and Duncan Temple Lang has given a nice answer: http://biostats.bepress.com/bioconductor/paper2/
With R 3.0.0 coming, it will be easy to achieve what they have outlined because R 3.0 allows custom vignette builders. Basically, your research paper can be built with 'R CMD build' and checked with 'R CMD check' if you provide an appropriate builder. An R package has the great potential of becoming the ideal tool for reproducible research due to its wonderful infrastructure: functions, datasets, examples, unit tests, vignettes, dependency structure, and so on. With the help of version control, you can easily spot the changes after you upgrade the packages. With an R package, you can automate a lot of things, e.g. install.packages() will take care of dependencies and R CMD build can rebuild your paper. Just like Bioc has a devel version, you can continuously check your results in a devel version, so that you know what is going to break if you upgrade to new versions of other packages. Is developing a research paper too different with developing a software package? (in the context of computing) Probably not. Long live the reproducible research! Regards, Yihui -- Yihui Xie <xieyi...@gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Mon, Mar 4, 2013 at 3:13 PM, Cook, Malcolm <m...@stowers.org> wrote: > Hi, > > In support of reproducible research at my Institute, I seek an approach to > re-creating the R environments in which an analysis has been conducted. > > By which I mean, the exact version of R and the exact version of all packages > used in a particular R session. > > I am seeking comments/criticism of this as a goal, and of the following > outline of an approach: > > === When all the steps to an workflow have been finalized === > * re-run the workflow from beginning to end > * save the results of sessionInfo() into an RDS file named after the current > date and time. > > === Later, when desirous of exactly recreating this analysis === > * read the (old) sessionInfo() into an R session > * exit with failure if the running version of R doesn't match > * compare the old sessionInfo to the currently available installed libraries > (i.e. using packageVersion) > * where there are discrepancies, install the required version of the package > (without dependencies) into new library (named after the old sessionInfo RDS > file) > > Then the analyst should be able to put the new library into the front of > .libPaths and run the analysis confident that the same version of the > packages. > > I have in that past used install-package-version.R to revert to previous > versions of R packages successfully (https://gist.github.com/1503736). And > there is a similar tool in Hadley Wickhams devtools. > > But, I don't know if I need something special for (BioConductor) packages > that have been installed using biocLite and seek advice here. > > I do understand that the R environment is not sufficient to guarantee > reproducibility. Some of my colleagues have suggested saving a virtual > machine with all your software/library/data installed. So, I am also in > general interested in what other people are doing to this end. But I am most > interested in: > > * is this a good idea > * is there a worked out solution > * does biocLite introduce special cases > * where do the dragons lurk > > ... and the like > > Any tips? > > Thanks, > > ~ Malcolm Cook > Stowers Institute / Computation Biology / Shilatifard Lab > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel