> There is nothing like backups with due attention to detail. Agreed, although given the complexity of dependencies among packages, this might entail several GB of snapshots per paper (if not several TB for some papers) in various cases. Anyone who is reasonably prolific then gets the exciting prospect of managing these backups.
At least if I grind out a vignette with a bunch of Bioconductor packages and call sessionInfo() at the end, I can find out later on (if, say, things stop working) what was the state of the tree when it last worked, and what might have changed since then. If a self-contained C++ or FORTRAN program is sufficient to perform an entire analysis, that's awesome, and it ought to be stuffed into revision control (doesn't everyone already do this?). But once you start using tools that depend on other tools, it becomes substantially more difficult to ensure that 1) a comprehensive snapshot is taken 2) reviewers, possibly on different platforms and/or major versions, can run using that snapshot 3) some means of a quick sanity check ("does this analysis even return sensible results?") can be run Hopefully this is better articulated than my previous missive. I believe we fundamentally agree; some of the particulars may be an issue of notation or typical workflow. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Thu, Mar 20, 2014 at 2:13 PM, Ted Byers <r.ted.by...@gmail.com> wrote: > On Thu, Mar 20, 2014 at 4:53 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu > >wrote: > > > On Thu, Mar 20, 2014 at 1:28 PM, Ted Byers <r.ted.by...@gmail.com> > wrote: > >> > >> Herve Pages mentions the risk of irreproducibility across three minor > >> revisions of version 1.0 of Matrix. My gut reaction would be that if > the > >> results are not reproducible across such minor revisions of one library, > >> they are probably just so much BS. > >> > > > > Perhaps this is just terminology, but what you refer to I would generally > > call 'replication'. Of course being able to replicate results with other > > data or other software is important to validate claims. But being able to > > reproduce how the original results were obtained is an important part of > > this process. > > > > Fair enough. > > > > If someone is publishing results that I think are questionable and I > > cannot replicate them, I want to know exactly how those outcomes were > > obtained in the first place, so that I can 'debug' the problem. It's > quite > > important to be able to trace back if incorrect results were a result of > a > > bug, incompetence or fraud. > > > > OK. That is where archives come in. When I had to deal with that sort > of > thing, I provided copies of both data and code to whoever asked. It ought > not be hard for authors to make an archive, to e.g. an optical disk, that > includes the software used along with the data, and store it like any other > backup, so it can be provided to anyone upon request. > > > > Let's take the example of the Reinhart and Rogoff case. The results > > obviously were not replicable, but without more information it was just > the > > word of a grad students vs two Harvard professors. Only after reproducing > > the original analysis it was possible to point out the errors and proof > > that the original were incorrect. > > > > > > > > > > Ok, but, if the practice I used were used, then a copy of the optical > disk > to which everything relevant was stored would solve that problem (and it > would be extremely easy for the researcher or his/her supervisor to do). I > once had a reviewer complain he couldn't reproduce my results, so I sent > him my code, which, translated into any of the Algol family of languages, > would allow him, or anyone else, to replicate my results regardless of > their programming language of choice. Once he had my code, he found his > error and reported back that he had finally replicated my results. Several > of my colleagues used the same practice, with the same consequences > (whenever questioned, they just provide their code, and related software, > and then their results were reproduced). There is nothing like backups > with due attention to detail. > > Cheers > > Ted > > -- > R.E.(Ted) Byers, Ph.D.,Ed.D. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel