On Jan 3, 2013, at 8:33 AM, Mario Bourgoin <m...@media.mit.edu> wrote:
> Dear Sir or Madam,
> 
> The group of people with whom I work is now convinced of the usefulness of
> using R and its packages to meet our needs for statistical analysis.  It
> has become important that R programs and scripts we create today can be run
> by someone else tomorrow, so need to use version-control.  For this to work
> well, we need to version-control not just our code, but also R and the CRAN
> packages we use.  (We only use CRAN for now.)  Fortunately, R is under
> Subversion, and many CRAN packages are under Subversion in R-Forge.
> However, many CRAN packages do not appear to be available from R-Forge.
> 
> 1- Are all CRAN packages available from some repository under version
> control?  (My guess is ``no.'')
> 2- Is there an identifier on CRAN that flags a package as under version
> control in a repository?  (My guess is ``no.'')
> 3- How does CRAN do version control for non-repository packages?  (My guess
> is ``through the generosity of volunteer administrators'' though I would
> prefer that some version control software be involved.)
> 4- Should we decide to create a local source repository to meet our needs?
> (My guess is ``that depends.'')
> 5- Where might I find examples of groups creating and maintaining local
> source repositories for R and its packages?
> 
> Sincerely,
> -- 
> Mario Bourgoin



I suspect that you will get various responses, so let me offer my ten cents:

1. The old versions of CRAN packages are typically, but possibly not always, 
available via an "Old Sources" link on each package's page on CRAN. You could 
use that approach to obtain old source versions of packages. However, it is 
conceivable that locally compiling and using the archived source version of 
that same package (eg. where you may have used a precompiled binary on OSX, 
Windows or even Linux in some cases) could yield behavioral changes over time. 
Hardware, OS, compiler and other environmental changes (bugs, 32 versus 64 bit, 
differing compiler options, etc.) could introduce even subtle problems that may 
perhaps preclude you absolutely replicating results from previous work. Those 
are especially important to consider for CRAN packages that are not "pure R" 
(eg. they include C, C++, FORTRAN, etc.).

2. The old versions of contributed CRAN packages that are physically on CRAN 
are not under a true file level source version control system there. It is up 
to each package maintainer/author to elect to use such a tool themselves 
outside of CRAN. R-Forge and GitHub are perhaps the two most popular online 
platforms, but others may be used and yet others may use local offline repos 
that you do not have access to. Some may not use a true version control system 
at all. There is no requirement for or any enforcement of a particular 
development process for contributed CRAN packages.

3. While R itself is under SVN control, unless you are compiling R from source 
and keeping track of SVN rev numbers, that is not likely to be helpful to you, 
if you typically install precompiled binary versions of R. You will want to 
archive the OS-specific R binaries that you use.

4. As noted above, it is conceivable that running code today versus running 
that same code five years from now using the same versions of R and CRAN 
packages that you used today can be problematic. It is not only R and the CRAN 
packages that are changing, but your hardware, OS, compilers and possible other 
relevant tools that are highly likely to change as well. All of these factors 
can contribute to your ability or inability to exactly replicate results over 
time. Only you can determine just how much of today's R/CRAN installation and 
computing environment you need to be able to replicate in the future.

5. If you have datasets that you will be using and need to replicate the same 
results five years from now on the same dataset that you used today, you will 
need to maintain your datasets (not just your code) in a version control system 
as well. 

6. You might also want to look into "Reproducible Research".


Bottom line, you have defined or are in the process of defining your own local 
requirements and perhaps SOPs. Thus, take control of your own risk mitigation 
process. Implement your own version control system locally, that includes, if 
you use them, precompiled binaries of R and any CRAN packages that you may use, 
so that you can replicate the state of an R installation to your own 
requirements, notwithstanding hardware and OS level changes that will occur. 

You will of course want to document the version of R and any third party 
packages that you use when performing an analysis, so that you can track such 
information for future use. 

If you compile and install source versions of R and CRAN packages, then I would 
keep source level tarballs of each in said version control system so that you 
can reasonably ensure access to them when you need it, even though they may 
also be available via CRAN.

I would be sure that such a repo (or more likely, content/project specific 
repos) are stored on a central server, which is backed up offline with a 
sufficient frequency and level of redundancy to mitigate loss risk.

The two most popular VC tools these days are SVN and Git. There are significant 
differences in the implementation models of both, so you will need to take time 
to consider your own functional and operational requirements, which would may 
lead you in one direction or the other. That being said, I made the switch from 
SVN to Git last year, even though I don't need true distributed version control 
myself. There are various reasons for that switch, which are beyond the scope 
of this discussion, so I won't get into details here.

I hope that the above is helpful.

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to