>>>>> Hervé Pagès <hpa...@fhcrc.org> >>>>> on Thu, 20 Mar 2014 15:23:57 -0700 writes:
> On 03/20/2014 01:28 PM, Ted Byers wrote: >> On Thu, Mar 20, 2014 at 3:14 PM, Hervé Pagès >> <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> wrote: >> >> On 03/20/2014 03:52 AM, Duncan Murdoch wrote: >> >> On 14-03-20 2:15 AM, Dan Tenenbaum wrote: >> >> >> >> ----- Original Message ----- >> >> From: "David Winsemius" <dwinsem...@comcast.net >> <mailto:dwinsem...@comcast.net>> To: "Jeroen Ooms" >> <jeroen.o...@stat.ucla.edu >> <mailto:jeroen.o...@stat.ucla.edu>> Cc: "r-devel" >> <r-devel@r-project.org <mailto:r-devel@r-project.org>> >> Sent: Wednesday, March 19, 2014 11:03:32 PM Subject: Re: >> [Rd] [RFC] A case for freezing CRAN >> >> >> On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote: >> >> On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt >> <michael.weyla...@gmail.com >> <mailto:michael.weyla...@gmail.com>> wrote: >> >> Reading this thread again, is it a fair summary of your >> position to say "reproducibility by default is more >> important than giving users access to the newest bug >> fixes and features by default?" It's certainly arguable, >> but I'm not sure I'm convinced: I'd imagine that the >> ratio of new work being done vs reproductions is rather >> high and the current setup optimizes for that already. >> >> >> I think that separating development from released >> branches can give us both reliability/reproducibility >> (stable branch) as well as new features (unstable >> branch). The user gets to pick (and you can pick >> both!). The same is true for r-base: when using a >> 'released' version you get 'stable' base packages that >> are up to 12 months old. If you want to have the latest >> stuff you download a nightly build of r-devel. For >> regular users and reproducible research it is recommended >> to use the stable branch. However if you are a developer >> (e.g. package author) you might want to >> develop/test/check your work with the latest r-devel. >> >> I think that extending the R release cycle to CRAN would >> result both in more stable released versions of R, as >> well as more freedom for package authors to implement >> rigorous change in the unstable branch. When writing a >> script that is part of a production pipeline, or sweave >> paper that should be reproducible 10 years from now, or a >> book on using R, you use stable version of R, which is >> guaranteed to behave the same over time. However when >> developing packages that should be compatible with the >> upcoming release of R, you use r-devel which has the >> latest versions of other CRAN and base packages. >> >> >> >> As I remember ... The example demonstrating the need for >> this was an XML package that cause an extract from a >> website where the headers were misinterpreted as data in >> one version of pkg:XML and not in another. That seems >> fairly unconvincing. Data cleaning and validation is a >> basic task of data analysis. It also seems excessive to >> assert that it is the responsibility of CRAN to maintain >> a synced binary archive that will be available in ten >> years. >> >> >> >> CRAN already does this, the bin/windows/contrib directory >> has subdirectories going back to 1.7, with packages dated >> October 2004. I don't see why it is burdensome to >> continue to archive these. It would be nice if source >> versions had a similar archive. >> >> >> The bin/windows/contrib directories are updated every day >> for active R versions. It's only when Uwe decides that a >> version is no longer worth active support that he stops >> doing updates, and it "freezes". A consequence of this >> is that the snapshots preserved in those older >> directories are unlikely to match what someone who keeps >> up to date with R releases is using. Their purpose is to >> make sure that those older versions aren't completely >> useless, but they aren't what Jeroen was asking for. >> >> >> But it is almost completely useless from a >> reproducibility point of view to get random package >> versions. For example if some people try to use R-2.13.2 >> today to reproduce an analysis that was published 2 years >> ago, they'll get Matrix 1.0-4 on Windows, Matrix 1.0-3 on >> Mac, and Matrix 1.1-2-2 on Unix. And none of them of >> course is what was used by the authors of the paper (they >> used Matrix 1.0-1, which is what was current when they >> ran their analysis). >> >> Initially this discussion brought back nightmares of DLL >> hell on Windows. Those as ancient as I will remember >> that well. But now, the focus seems to be on >> reproducibility, but with what strikes me as a seriously >> flawed notion of what reproducibility means. >> >> Herve Pages mentions the risk of irreproducibility across >> three minor revisions of version 1.0 of Matrix. > If you use R-2.13.2, you get Matrix 1.1-2-2 on > Linux. No way! Matrix 1.1-2-2 has Depends: R (>= 2.15.2) > AFAIK this is the most recent version of Matrix, > aimed to be compatible with the most current version of R > (i.e. R 3.0.3). However, it has never been tested with R-2.13.2. Exactly. And for this reason, I have adopted to keep Depends: R (>= ...) in Matrix and partly, in other packages I maintain. Doing so does prevent users of old versions of R to get new features, and even more importantly, get the latest (few, of course ! ;-) bug-fixes for Matrix. But apart from this short note. I'm very sympathetic with optionally providing easier (not "easy") ways of setting up old versions of R and packages, where users can pretty quickly use the printed (unfortunately, for now) output of sessionInfo(), to reinstall 1) the version of R 2) an install.packages() call which tries (!) to get the corresponding packages (in their correct version) from CRAN (including ./Archive/ !).. similarly to what Duncan Murdoch has agreed to. > I'm not saying that it should, that would be a > big waste of resources of course. All I'm saying it that > it doesn't make sense to serve by default a version that > is known to be incompatible with the version of R being > used. It's very likely to not even install properly. [..............] > Also note that back in October 2011, people using R-2.13.2 > would get e.g. ape 2.7-3 on Linux, Windows and > Mac. Wouldn't it make sense that people using R-2.13.2 > today get the same? Why would anybody use R-2.13.2 today > if it's not to run again some code that was written and > used two years ago to obtain some important results? I also tend to agree that it would be great if someone (Karl Millar -> Google ?) would setup a good time-stamping system for CRAN {and Bioconductor and Omegahat and ..?} packages. Ideally that system would work by *using* the CRAN (and ..) infrastructure. > Cheers, H. I'm still unsure if I should agree with you (Hervé) that some freezing / "data base of package timestamps" should happen on-CRAN in addition. Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel