Re: [Rd] [RFC] A case for freezing CRAN

Tim Triche, Jr. Thu, 20 Mar 2014 14:43:52 -0700

> There is nothing like backups with due attention to detail.

Agreed, although given the complexity of dependencies among packages, this
might entail several GB of snapshots per paper (if not several TB for some
papers) in various cases.  Anyone who is reasonably prolific then gets the
exciting prospect of managing these backups.


At least if I grind out a vignette with a bunch of Bioconductor packages
and call sessionInfo() at the end, I can find out later on (if, say, things
stop working) what was the state of the tree when it last worked, and what
might have changed since then.  If a self-contained C++ or FORTRAN program
is sufficient to perform an entire analysis, that's awesome, and it ought
to be stuffed into revision control (doesn't everyone already do this?).
 But once you start using tools that depend on other tools, it becomes
substantially more difficult to ensure that

1) a comprehensive snapshot is taken
2) reviewers, possibly on different platforms and/or major versions, can
run using that snapshot
3) some means of a quick sanity check ("does this analysis even return
sensible results?") can be run

Hopefully this is better articulated than my previous missive.

I believe we fundamentally agree; some of the particulars may be an issue
of notation or typical workflow.



Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>


On Thu, Mar 20, 2014 at 2:13 PM, Ted Byers <r.ted.by...@gmail.com> wrote:

> On Thu, Mar 20, 2014 at 4:53 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu
> >wrote:
>
> > On Thu, Mar 20, 2014 at 1:28 PM, Ted Byers <r.ted.by...@gmail.com>
> wrote:
> >>
> >> Herve Pages mentions the risk of irreproducibility across three minor
> >> revisions of version 1.0 of Matrix.  My gut reaction would be that if
> the
> >> results are not reproducible across such minor revisions of one library,
> >> they are probably just so much BS.
> >>
> >
> > Perhaps this is just terminology, but what you refer to I would generally
> > call 'replication'. Of course being able to replicate results with other
> > data or other software is important to validate claims. But being able to
> > reproduce how the original results were obtained is an important part of
> > this process.
> >
> > Fair enough.
>
>
> > If someone is publishing results that I think are questionable and I
> > cannot replicate them, I want to know exactly how those outcomes were
> > obtained in the first place, so that I can 'debug' the problem. It's
> quite
> > important to be able to trace back if incorrect results were a result of
> a
> > bug, incompetence or fraud.
> >
> > OK.  That is where archives come in.  When I had to deal with that sort
> of
> thing, I provided copies of both data and code to whoever asked.  It ought
> not be hard for authors to make an archive, to e.g. an optical disk, that
> includes the software used along with the data, and store it like any other
> backup, so it can be provided to anyone upon request.
>
>
> > Let's take the example of the Reinhart and Rogoff case. The results
> > obviously were not replicable, but without more information it was just
> the
> > word of a grad students vs two Harvard professors. Only after reproducing
> > the original analysis it was possible to point out the errors and proof
> > that the original were incorrect.
> >
> >
> >
> >
> > Ok, but, if the practice I used were used, then a copy of the optical
> disk
> to which everything relevant was stored would solve that problem (and it
> would be extremely easy for the researcher or his/her supervisor to do).  I
> once had a reviewer complain he couldn't reproduce my results, so I sent
> him my code, which, translated into any of the Algol family of languages,
> would allow  him, or anyone else, to replicate my results regardless of
> their programming language of choice.  Once he had my code, he found his
> error and reported back that he had finally replicated my results.  Several
> of my colleagues used the same practice, with the same consequences
> (whenever questioned, they just provide their code, and related software,
> and then their results were reproduced).  There is nothing like backups
> with due attention to detail.
>
> Cheers
>
> Ted
>
> --
> R.E.(Ted) Byers, Ph.D.,Ed.D.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [RFC] A case for freezing CRAN

Reply via email to