On Thu, Mar 20, 2014 at 5:27 PM, Tim Triche, Jr. <tim.tri...@gmail.com>wrote:
> > There is nothing like backups with due attention to detail. > > Agreed, although given the complexity of dependencies among packages, this > might entail several GB of snapshots per paper (if not several TB for some > papers) in various cases. Anyone who is reasonably prolific then gets the > exciting prospect of managing these backups. > > Isn't that what support staff is for? ;-) But, storage space is cheap, and as tedious as managing backups can be (definitely not fun), it is managable. > At least if I grind out a vignette with a bunch of Bioconductor packages > and call sessionInfo() at the end, I can find out later on (if, say, things > stop working) what was the state of the tree when it last worked, and what > might have changed since then. If a self-contained C++ or FORTRAN program > is sufficient to perform an entire analysis, that's awesome, and it ought > to be stuffed into revision control (doesn't everyone already do this?). > But once you start using tools that depend on other tools, it becomes > substantially more difficult to ensure that > > 1) a comprehensive snapshot is taken > 2) reviewers, possibly on different platforms and/or major versions, can > run using that snapshot > 3) some means of a quick sanity check ("does this analysis even return > sensible results?") can be run > > Hopefully this is better articulated than my previous missive. > > Tell me about it. Oh, wait, you already did. ;-) I understand this, as I routinely work with complex distributed systems involving multiple programming languages and other diverse tools. But such is part of the overhead of doing quality work. > I believe we fundamentally agree; some of the particulars may be an issue > of notation or typical workflow. > > > I agree that we fundamentally agree ;-) >From my experience, the issues addressed in this thread are probably best handled by in the package developers and those authors that use their packages, rather than imposing additional work on those responsible for CRAN, especially when the means for doing things a little differently than how CRAN does it are readily available. Cheers Ted R.E.(Ted) Byers, Ph.D.,Ed.D. > > Statistics is the grammar of science. > Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> > > > On Thu, Mar 20, 2014 at 2:13 PM, Ted Byers <r.ted.by...@gmail.com> wrote: > >> On Thu, Mar 20, 2014 at 4:53 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu >> >wrote: >> >> > On Thu, Mar 20, 2014 at 1:28 PM, Ted Byers <r.ted.by...@gmail.com> >> wrote: >> >> >> >> Herve Pages mentions the risk of irreproducibility across three minor >> >> revisions of version 1.0 of Matrix. My gut reaction would be that if >> the >> >> results are not reproducible across such minor revisions of one >> library, >> >> they are probably just so much BS. >> >> >> > >> > Perhaps this is just terminology, but what you refer to I would >> generally >> > call 'replication'. Of course being able to replicate results with other >> > data or other software is important to validate claims. But being able >> to >> > reproduce how the original results were obtained is an important part of >> > this process. >> > >> > Fair enough. >> >> >> > If someone is publishing results that I think are questionable and I >> > cannot replicate them, I want to know exactly how those outcomes were >> > obtained in the first place, so that I can 'debug' the problem. It's >> quite >> > important to be able to trace back if incorrect results were a result >> of a >> > bug, incompetence or fraud. >> > >> > OK. That is where archives come in. When I had to deal with that sort >> of >> thing, I provided copies of both data and code to whoever asked. It ought >> not be hard for authors to make an archive, to e.g. an optical disk, that >> includes the software used along with the data, and store it like any >> other >> backup, so it can be provided to anyone upon request. >> >> >> > Let's take the example of the Reinhart and Rogoff case. The results >> > obviously were not replicable, but without more information it was just >> the >> > word of a grad students vs two Harvard professors. Only after >> reproducing >> > the original analysis it was possible to point out the errors and proof >> > that the original were incorrect. >> > >> > >> > >> > >> > Ok, but, if the practice I used were used, then a copy of the optical >> disk >> to which everything relevant was stored would solve that problem (and it >> would be extremely easy for the researcher or his/her supervisor to do). >> I >> once had a reviewer complain he couldn't reproduce my results, so I sent >> him my code, which, translated into any of the Algol family of languages, >> would allow him, or anyone else, to replicate my results regardless of >> their programming language of choice. Once he had my code, he found his >> error and reported back that he had finally replicated my results. >> Several >> of my colleagues used the same practice, with the same consequences >> (whenever questioned, they just provide their code, and related software, >> and then their results were reproduced). There is nothing like backups >> with due attention to detail. >> >> Cheers >> >> Ted >> >> -- >> R.E.(Ted) Byers, Ph.D.,Ed.D. >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > -- [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel