There seems to be some question of how frequently changes to software packages result in irreproducible results.
I am sure Terry is correct that research using functions like `glm` and other functions that are shipped with base R are quite reliable; and after all they already benefit from being versioned with R releases as Jeroen argues. In my field of ecology and evolution, the situation is quite different. Packages are frequently developed by scientists without any background in programming and become widely used, such as [geiger]( http://cran.r-project.org/web/packages/geiger/), with 463 papers citing it and probably many more using it that do not cite it (both because it is sometimes used only as a dependency of another package or just because our community isn't great at citing packages). The package has changed substantially over the time it has been on CRAN and many functions that would once run based on older versions could no longer run on newer ones. It's dependencies, notably the phylogenetics package ape, has changed continually over that interval with both bug fixes and substantial changes to the basic data structure. The ape package has 1,276 citations (again a lower bound). I suspect that correctly identifying the right version of the software used in any of these thousands of papers would prove difficult and for a large fraction the results would simply not execute successfully. It would be much harder to track down cases where the bug fixes would have any impact on the result. I have certainly seen both problems in the hundreds of Sweave/knitr files I have produced over the years that use these packages. Even work that simply relies on a package that has been archived becomes a substantial challenge to reproducibility by other scientists even when an expert familiar with the packages (e.g. the original author) would not have a problem, as the informatics team at the Evolutionary Synthesis center recently concluded in an exercise trying to reproduce several papers including my own that used a package that had been archived (odesolve, whose replacement, deSolve, does not use quite the same function call for the same `lsoda` function). New methods are being published all the time, and I think it is excellent that in ecology and evolution it is increasingly standard to publish R packages implementing those methods, as a scan of any table of contents in "methods in Ecology and Evolution", for instance, will quickly show. But unlike `glm`, these methods have a long way to go before they are fully tested and debugged, and reproducing any work based on them requires a close eye to the versions (particularly when unit tests and even detailed changelogs are not common). The methods are invariably built by "user-developers", researchers developing the code for their own needs, and thus these packages can themselves fall afoul of changes as they depend and build upon work of other nascent ecology and evolution packages. Detailed reproducibility studies of published work in this area are still hard to come by, not least because the actual code used by the researchers is seldom published (other than when it is published as it's own R package). But incompatibilities between successive versions of the 100s of packages in our domain, along with the interdependencies of those packages might provide some window into the difficulties of computational reproducibility. I suspect changes in these fast-moving packages are far more culprit than differences in compilers and operating systems. Cheers, Carl On Thu, Mar 20, 2014 at 10:23 AM, Greg Snow <538...@gmail.com> wrote: > On Thu, Mar 20, 2014 at 7:32 AM, Dirk Eddelbuettel <e...@debian.org> wrote: > [snip] > > > (and some readers > > may recall the infamous Pentium bug of two decades ago). > > It was a "Flaw" not a "Bug". At least I remember the Intel people > making a big deal about that distinction. > > But I do remember the time well, I was a biostatistics Ph.D. student > at the time and bought one of the flawed pentiums. My attempts at > getting the chip replaced resulted in a major run around and each > person that I talked to would first try to explain that I really did > not need the fix because the only people likely to be affected were > large corporations and research scientists. I will admit that I was > not a large corporation, but if a Ph.D. student in biostatistics is > not a research scientist, then I did not know what they defined one > as. When I pointed this out they would usually then say that it still > would not matter, unless I did a few thousand floating point > operations I was unlikely to encounter one of the problematic > divisions. I would then point out that some days I did over 10,000 > floating point operations before breakfast (I had checked after the > 1st person told me this and 10,000 was a low estimate of a lower bound > of one set of simulations) at which point they would admit that I had > a case and then send me to talk to someone else who would start the > process over. > > > > [snip] > > -- > > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Carl Boettiger UC Santa Cruz http://carlboettiger.info/ [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel