Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Aaron Mackey
On Mon, Mar 4, 2013 at 4:13 PM, Cook, Malcolm  wrote:

> * where do the dragons lurk
>

webs of interconnected dynamically loaded libraries, identical versions of
R compiled with different BLAS/LAPACK options, etc.  Go with the VM if you
really, truly, want this level of exact reproducibility.

An alternative (and arguably more useful) strategy would be to cache
results of each computational step, and report when results differ upon
re-execution with identical inputs; if you cache sessionInfo along with
each result, you can identify which package(s) changed, and begin to hunt
down why the change occurred (possibly for the better); couple this with
the concept of keeping both code *and* results in version control, then you
can move forward with a (re)analysis without being crippled by out-of-date
software.

-Aaron

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amac...@virginia.edu
http://www.cphg.virginia.edu/mackey

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Mike Marchywka

I hate to ask what go this thread started but it sounds like someone was 
counting on 
exact numeric reproducibility or was there a bug in a specific release? In 
actual 
fact, the best way to determine reproducibility is run the code in a variety of
packages. Alternatively, you can do everything in java and not assume 
that calculations commute or associate as the code is modified but it seems
pointless. Sensitivity determination would seem to lead to more reprodicible 
results
than trying to keep a specific set of code quirks.

I also seem to recall that FPU may have random lower order bits in some cases,
same code/data give different results. Alsways assume FP is stochastic and plan
on anlayzing the "noise."



> From: amac...@virginia.edu
> Date: Mon, 4 Mar 2013 16:28:48 -0500
> To: m...@stowers.org
> CC: r-devel@r-project.org; bioconduc...@r-project.org; 
> r-discuss...@listserv.stowers.org
> Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
> management & install.package.version & BiocLite
>
> On Mon, Mar 4, 2013 at 4:13 PM, Cook, Malcolm  wrote:
>
> > * where do the dragons lurk
> >
>
> webs of interconnected dynamically loaded libraries, identical versions of
> R compiled with different BLAS/LAPACK options, etc. Go with the VM if you
> really, truly, want this level of exact reproducibility.
>
> An alternative (and arguably more useful) strategy would be to cache
> results of each computational step, and report when results differ upon
> re-execution with identical inputs; if you cache sessionInfo along with
> each result, you can identify which package(s) changed, and begin to hunt
> down why the change occurred (possibly for the better); couple this with
> the concept of keeping both code *and* results in version control, then you
> can move forward with a (re)analysis without being crippled by out-of-date
> software.
>
> -Aaron
>
> --
> Aaron J. Mackey, PhD
> Assistant Professor
> Center for Public Health Genomics
> University of Virginia
> amac...@virginia.edu
> http://www.cphg.virginia.edu/mackey
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
  
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
.>>> * where do the dragons lurk
 .>>>
 .>>
 .>> webs of interconnected dynamically loaded libraries, identical versions of
 .>> R compiled with different BLAS/LAPACK options, etc.  Go with the VM if you
 .>> really, truly, want this level of exact reproducibility.
 .>
 .> Sounds like the best bet -- maybe tools like vagrant might be useful here:
 .>
 .> http://www.vagrantup.com
 .>
 .> ... or maybe they're overkill?
 .>
 .> Haven't really checked it out myself too much, my impression is that
 .> these tools (vagrant, chef, puppet) are built to handle such cases.
 .>
 .> I'd imagine you'd probably need a location where you can grab the
 .> precise (versioned) packages for the things you are specifying, but
 .
 .Right...and this is a bit tricky, because we don't keep old versions
 .around in our BioC software repositories.  They are available through
 .Subversion but with the sometimes additional overhead of setting up
 .build-time dependencies.


So, even if I wanted to go where dragons lurked, it would not be possible to 
cobble a version of biocLite that installed specific versions of software.

Thus, I might rather consider an approach that at 'publish' time tarzips up a 
copy of the R package dependencies based on a config file defined from 
sessionInfo and caches it in the project directory.

Then when/if the project is revisited (and found to produce differnt results 
under current R enviRonment),  I can "simply" install an old R (oops, I guess 
I'd have to build it), and then un-tarzip the dependencies into the projects 
own R/Library which I would put on .libpaths.

Or, or?  

(My virtual machine advocating colleagues are snickering now, I am sure..)

Thanks for all your thoughts and advices

--Malcolm

 .
 .
 .> ...
 .>
 .> -steve
 .>
 .> --
 .> Steve Lianoglou
 .> Graduate Student: Computational Systems Biology
 .>  | Memorial Sloan-Kettering Cancer Center
 .>  | Weill Medical College of Cornell University
 .> Contact Info: http://cbio.mskcc.org/~lianos/contact
 .>
 .> __
 .> R-devel@r-project.org mailing list
 .> https://stat.ethz.ch/mailman/listinfo/r-devel
 .
 .___
 .Bioconductor mailing list
 .bioconduc...@r-project.org
 .https://stat.ethz.ch/mailman/listinfo/bioconductor
 .Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Dr Gregory Jefferis

On 5 Mar 2013, at 14:36, Cook, Malcolm wrote:

So, even if I wanted to go where dragons lurked, it would not be 
possible to cobble a version of biocLite that installed specific 
versions of software.


Thus, I might rather consider an approach that at 'publish' time 
tarzips up a copy of the R package dependencies based on a config file 
defined from sessionInfo and caches it in the project directory.


Then when/if the project is revisited (and found to produce differnt 
results under current R enviRonment),  I can "simply" install an old R 
(oops, I guess I'd have to build it), and then un-tarzip the 
dependencies into the projects own R/Library which I would put on 
.libpaths.


Sounds a little like this:

http://cran.r-project.org/web/packages/rbundler/index.html

(which I haven't tested). Best,

Greg.

--
PLEASE NOTE CHANGE OF CONTACT DETAILS FROM MON 4TH MARCH:

Gregory Jefferis, PhD   Tel: 01223 267048
Division of Neurobiology
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge, CB2 OQH, UK

http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis
http://jefferislab.org
http://flybrain.stanford.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
All,

What got me started on this line of inquiry was my attempt at balancing the 
advantages of performing a periodic (daily or weekly) update to the 'release' 
version of locally installed R/Bioconductor packages on our institute-wide 
installation of R with the disadvantages of potentially changing the result of 
an analyst's workflow in mid-project.

I just got the "green light" to institute such periodic updates that I have 
been arguing is in our collective best interest.  In return,  I promised my 
best effort to provide a means for preserving or reverting to a working R 
library configuration.

Please note that the reproducibility I am most eager to provide is limited to 
reproducibility within the computing environment of our institute, which 
perhaps takes away some of the dragon's nests, though certainly not all.

There are technical issues of updating package installations on an NFS mount 
that might have files/libraries open on it from running R sessions.  I am 
interested in learning of approaches for minimizing/eliminating exposure to 
these issue as well.  The first/best approach seems to be to institute a 'black 
out' period when users should expect the installed library to change.   Perhaps 
there are improvements to this

Best,

Malcolm


 .-Original Message-
 .From: Mike Marchywka [mailto:marchy...@hotmail.com]
 .Sent: Tuesday, March 05, 2013 5:24 AM
 .To: amac...@virginia.edu; Cook, Malcolm
 .Cc: r-devel@r-project.org; bioconduc...@r-project.org; 
r-discuss...@listserv.stowers.org
 .Subject: RE: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .
 .I hate to ask what go this thread started but it sounds like someone was 
counting on
 .exact numeric reproducibility or was there a bug in a specific release? In 
actual
 .fact, the best way to determine reproducibility is run the code in a variety 
of
 .packages. Alternatively, you can do everything in java and not assume
 .that calculations commute or associate as the code is modified but it seems
 .pointless. Sensitivity determination would seem to lead to more reprodicible 
results
 .than trying to keep a specific set of code quirks.
 .
 .I also seem to recall that FPU may have random lower order bits in some cases,
 .same code/data give different results. Alsways assume FP is stochastic and 
plan
 .on anlayzing the "noise."
 .
 .
 .
 .> From: amac...@virginia.edu
 .> Date: Mon, 4 Mar 2013 16:28:48 -0500
 .> To: m...@stowers.org
 .> CC: r-devel@r-project.org; bioconduc...@r-project.org; 
r-discuss...@listserv.stowers.org
 .> Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .>
 .> On Mon, Mar 4, 2013 at 4:13 PM, Cook, Malcolm  wrote:
 .>
 .> > * where do the dragons lurk
 .> >
 .>
 .> webs of interconnected dynamically loaded libraries, identical versions of
 .> R compiled with different BLAS/LAPACK options, etc. Go with the VM if you
 .> really, truly, want this level of exact reproducibility.
 .>
 .> An alternative (and arguably more useful) strategy would be to cache
 .> results of each computational step, and report when results differ upon
 .> re-execution with identical inputs; if you cache sessionInfo along with
 .> each result, you can identify which package(s) changed, and begin to hunt
 .> down why the change occurred (possibly for the better); couple this with
 .> the concept of keeping both code *and* results in version control, then you
 .> can move forward with a (re)analysis without being crippled by out-of-date
 .> software.
 .>
 .> -Aaron
 .>
 .> --
 .> Aaron J. Mackey, PhD
 .> Assistant Professor
 .> Center for Public Health Genomics
 .> University of Virginia
 .> amac...@virginia.edu
 .> http://www.cphg.virginia.edu/mackey
 .>
 .> [[alternative HTML version deleted]]
 .>
 .> __
 .> R-devel@r-project.org mailing list
 .> https://stat.ethz.ch/mailman/listinfo/r-devel
 .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Geoff Jentry

On Tue, 5 Mar 2013, Cook, Malcolm wrote:
Thus, I might rather consider an approach that at 'publish' time tarzips 
up a copy of the R package dependencies based on a config file defined 
from sessionInfo and caches it in the project directory.


If you had a separate environment for every project, each with its own R 
installation and R installation lib.loc this becomes rather easy. For 
instance, something like this:


myProject/
   projectRInstallation/
  bin/
R
  library/
Biobase
annotate
.
  
   projectData/
   projectCode/
   projectOutput/

The directory structure would likely be more complicated than that but 
something along those lines. This way all code, data *and* compute 
environment are always linked together.


-J

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
.> So, even if I wanted to go where dragons lurked, it would not be
 .> possible to cobble a version of biocLite that installed specific
 .> versions of software.
 .>
 .> Thus, I might rather consider an approach that at 'publish' time
 .> tarzips up a copy of the R package dependencies based on a config file
 .> defined from sessionInfo and caches it in the project directory.
 .>
 .> Then when/if the project is revisited (and found to produce differnt
 .> results under current R enviRonment),  I can "simply" install an old R
 .> (oops, I guess I'd have to build it), and then un-tarzip the
 .> dependencies into the projects own R/Library which I would put on
 .> .libpaths.
 .
 .Sounds a little like this:
 .
 .http://cran.r-project.org/web/packages/rbundler/index.html
 .
 .(which I haven't tested). Best,
 .
 .Greg.

Looks interesting - thanks for the suggestion.

But, but my use case is one in which an analyst at my site depends upon the 
local library installation and only retrospectively, at some publishable event 
(like handing the results over the in-house customer/scientist), seeks to 
ensure the ability to return to that exact R library environment  later.  This 
tool, on the other hand, commits the user to keep a project specific "bundle" 
from the outset.  Another set of trade-offs.  I will have to synthesize the 
options I am learning.

~ Malcolm 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Prof Brian Ripley
One comment: I have found numerical changes due to updates to the OS's 
compilers or runtime at least as often as I have been by changes in R or 
packages when trying to reproduce results from a year or two back.  That 
aspect is rarely mentioned in these discussions.


On 05/03/2013 15:09, Cook, Malcolm wrote:

.> So, even if I wanted to go where dragons lurked, it would not be
  .> possible to cobble a version of biocLite that installed specific
  .> versions of software.
  .>
  .> Thus, I might rather consider an approach that at 'publish' time
  .> tarzips up a copy of the R package dependencies based on a config file
  .> defined from sessionInfo and caches it in the project directory.
  .>
  .> Then when/if the project is revisited (and found to produce differnt
  .> results under current R enviRonment),  I can "simply" install an old R
  .> (oops, I guess I'd have to build it), and then un-tarzip the
  .> dependencies into the projects own R/Library which I would put on
  .> .libpaths.
  .
  .Sounds a little like this:
  .
  .http://cran.r-project.org/web/packages/rbundler/index.html
  .
  .(which I haven't tested). Best,
  .
  .Greg.

Looks interesting - thanks for the suggestion.

But, but my use case is one in which an analyst at my site depends upon the local 
library installation and only retrospectively, at some publishable event (like handing 
the results over the in-house customer/scientist), seeks to ensure the ability to return 
to that exact R library environment  later.  This tool, on the other hand, commits the 
user to keep a project specific "bundle" from the outset.  Another set of 
trade-offs.  I will have to synthesize the options I am learning.

~ Malcolm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Patch for format.ftable()

2013-03-05 Thread Marius Hofert
Dear expeRts,

Please find attached the .diff for a bug fix in R-devel 62124. format.ftable()
fails to format ftable()s correctly which have no row.vars or no
col.vars. That should work with the patch (the example code below also runs
correctly for all the (new) 'method's).

Cheers,

Marius


--8<---cut here---start->8---
(ft1 <- ftable(Titanic, col.vars = 1:4))
(ft2 <- ftable(Titanic, row.vars = 1))
(ft3 <- ftable(Titanic, row.vars = 1:2))
(ft4 <- ftable(Titanic, row.vars = 1:3))
(ft5 <- ftable(Titanic, row.vars = 1:4))

## former version (R-devel 62124)
stats:::format.ftable(ft1) # fails
stats:::format.ftable(ft2)
stats:::format.ftable(ft3)
stats:::format.ftable(ft4)
stats:::format.ftable(ft5) # fails

## all work fine now (format.ftable.() is the patched version):
format.ftable.(ft1)
format.ftable.(ft1, method="row.compact")
format.ftable.(ft1, method="col.compact")
format.ftable.(ft1, method="compact")
format.ftable.(ft2)
format.ftable.(ft2, method="row.compact")
format.ftable.(ft2, method="col.compact")
format.ftable.(ft2, method="compact")
format.ftable.(ft3)
format.ftable.(ft3, method="row.compact")
format.ftable.(ft3, method="col.compact")
format.ftable.(ft3, method="compact")
format.ftable.(ft4)
format.ftable.(ft4, method="row.compact")
format.ftable.(ft4, method="col.compact")
format.ftable.(ft4, method="compact")
format.ftable.(ft5)
format.ftable.(ft5, method="row.compact")
format.ftable.(ft5, method="col.compact")
format.ftable.(ft5, method="compact")
--8<---cut here---end--->8---


-- 
ETH Zurich
Dr. Marius Hofert
RiskLab, Department of Mathematics
HG E 65.2
Rämistrasse 101
8092 Zurich
Switzerland

Phone +41 44 632 2423
http://www.math.ethz.ch/~hofertj
GPG key fingerprint 8EF4 5842 0EA2 5E1D 3D7F  0E34 AD4C 566E 655F 3F7C
--- ftable_R-devel-62124.R	2013-02-27 17:51:38.0 +0100
+++ ftable_new.R	2013-03-05 19:50:15.775512036 +0100
@@ -171,7 +171,7 @@
 format.ftable <-
 function(x, quote=TRUE, digits=getOption("digits"),
  method=c("non.compact", "row.compact", "col.compact", "compact"),
- lsep=" \\ ", ...)
+ lsep=" | ", ...)
 {
 if(!inherits(x, "ftable"))
 	stop("'x' must be an \"ftable\" object")
@@ -194,9 +194,18 @@
 	if(is.null(nmx)) rep_len("", length(x)) else nmx
 }
 
-xrv <- attr(x, "row.vars")
-xcv <- attr(x, "col.vars")
+l.xrv <- length(xrv <- attr(x, "row.vars"))
+l.xcv <- length(xcv <- attr(x, "col.vars"))
 method <- match.arg(method)
+## possibly adjust method to correctly deal with 'extreme' layouts (no col.vars, no row.vars)
+if(l.xrv==0) {
+if(method=="col.compact") method <- "non.compact" # 'non.compact' already produces a 'col.compact' version
+else if (method=="compact") method <- "row.compact" # only need to 'row.compact'ify
+}
+if(l.xcv==0) {
+if(method=="row.compact") method <- "non.compact" # 'non.compact' already produces a 'row.compact' version
+else if (method=="compact") method <- "col.compact" # only need to 'col.compact'ify
+}
 LABS <-
 	switch(method,
 	   "non.compact" =		# current default
@@ -224,8 +233,6 @@
 	   },
 	   "compact" =		# fully compact version
 	   {
-	   l.xcv <- length(xcv)
-	   l.xrv <- length(xrv)
 	   xrv.nms <- makeNames(xrv)
 	   xcv.nms <- makeNames(xcv)
 	   mat <- cbind(rbind(cbind(matrix("", nrow = l.xcv-1, ncol = l.xrv-1),
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Paul Gilbert

(More on the original question further below.)

On 13-03-05 09:48 AM, Cook, Malcolm wrote:

All,

What got me started on this line of inquiry was my attempt at
balancing the advantages of performing a periodic (daily or weekly)
update to the 'release' version of locally installed R/Bioconductor
packages on our institute-wide installation of R with the
disadvantages of potentially changing the result of an analyst's
workflow in mid-project.


I have implemented a strategy to try to address this as follows:

1/ Install a new version of R when it is released, and packages in the R 
version's site-library with package versions as available at the time 
the R version is installed. Only upgrade these package versions in the 
case they are severely broken.


2/ Install the same packages in site-library-fresh and upgrade these 
package versions on a regular basis (e.g. daily).


3/ When a new version of R is released, freeze but do not remove the old 
R version, at least not for a fairly long time, and freeze 
site-library-fresh for the old version. Begin with the new version as in 
1/ and 2/. The old version remains available, so "reverting" is trivial.



The analysts are then responsible for choosing the R version they use, 
and the library they use. This means they do not have to change R and 
package version mid-project, but they can if they wish. I think the 
above two libraries will cover most cases, but it is possible that a few 
projects will need their own special library with a combination of 
package versions. In this case the user could create their own library, 
or you might prefer some more official mechanism.


The idea of the above strategy is to provide the stability one might 
want for an ongoing project, and the possibility of an upgraded package 
if necessary, but not encourage analysts to remain indefinitely with old 
versions (by say, putting new packages in an old R version library).


This strategy has been implemented in a set of make files in the project 
RoboAdmin available at http://automater.r-forge.r-project.org/. It can 
be done entirely automatically with a cron job. Constructive comments 
are always appreciated.


(IT departments sometimes think that there should be only one version of 
everything available, which they test and approve. So the initial 
reaction to this approach could be negative. I think they have not 
really thought about the advantages. They usually cannot test/approve an 
upgrade without user input, and timing is often extremely complicate 
because of ongoing user needs. This strategy is simply shifting 
responsibility and timing to the users, or user departments, that can 
actually do the testing and approving.)


Regarding NFS mounts, it is relatively robust. There can be occasional 
problems, especially for users that have a habit of keeping an R session 
open for days at a time and using site-library-fresh packages. In my 
experience this did not happen often enough to worry about a "blackout 
period".


Regarding the original question, I would like to think it could be 
possible to keep enough information to reproduce the exact environment, 
but I think for potentially sensitive numerical problems that is 
optimistic. As others have pointed out, results can depend not only on R 
and package versions, configuration, OS versions, and library and 
compiler versions, but also on the underlying hardware. You might have 
some hope using something like an Amazon core instance. (BTW, this 
problem is not specific to R.)


It is true that restricting to a fixed computing environment at your 
institution may ease things somewhat, but if you occasionally upgrade 
hardware or the OS then you will probably lose reproducibility.


An alternative that I recommend is that you produce a set of tests that 
confirm the results of any important project. These can be conveniently 
put in the tests/ directory of an R package, which is then maintained 
local, not on CRAN, and built/tested whenever a new R and packages are 
installed. (Tools for this are also available at the above indicated web 
site.) This approach means that you continue to reproduce the old 
results, or if not, discover differences/problems in the old or new 
version of R and/or packages that may be important to you. I have been 
successfully using a variant of this since about 1993, using R and 
package tests/ since they became available.


Paul



I just got the "green light" to institute such periodic updates that
I have been arguing is in our collective best interest.  In return,
I promised my best effort to provide a means for preserving or
reverting to a working R library configuration.

Please note that the reproducibility I am most eager to provide is
limited to reproducibility within the computing environment of our
institute, which perhaps takes away some of the dragon's nests,
though certainly not all.

There are technical issues of updating package installations on an
NFS mount that might have fi

Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Steve Lianoglou
Hi Paul,

You outline some great suggestions!

I just wanted to point that in this case:

On Tue, Mar 5, 2013 at 5:34 PM, Paul Gilbert  wrote:
[snip]

> Regarding NFS mounts, it is relatively robust. There can be occasional
> problems, especially for users that have a habit of keeping an R session
> open for days at a time and using site-library-fresh packages. In my
> experience this did not happen often enough to worry about a "blackout
> period".

if users have a habit of working like this, they could also create an
R-library directory under their home directory, and put this library
path at the front of their .libPaths() so the continually updated
"fresh" stuff won't affect them.

Just wanted to point that out as I really like your general approach
you've outlined, and just wanted to point out that there's an easy
work around in case someone else tries to institute such a regime but
is getting friction due to that point in particular.

Good stuff, though .. thanks for sharing that!

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
Paul,

I think your balanced and reasoned approach addresses all my current concerns.  
Nice!  I will likely adopt your methods.  Let me ruminate.  Thanks for this.

~ Malcolm

 .-Original Message-
 .From: Paul Gilbert [mailto:pgilbert...@gmail.com]
 .Sent: Tuesday, March 05, 2013 4:34 PM
 .To: Cook, Malcolm
 .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 
'r-discuss...@listserv.stowers.org'
 .Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .(More on the original question further below.)
 .
 .On 13-03-05 09:48 AM, Cook, Malcolm wrote:
 .> All,
 .>
 .> What got me started on this line of inquiry was my attempt at
 .> balancing the advantages of performing a periodic (daily or weekly)
 .> update to the 'release' version of locally installed R/Bioconductor
 .> packages on our institute-wide installation of R with the
 .> disadvantages of potentially changing the result of an analyst's
 .> workflow in mid-project.
 .
 .I have implemented a strategy to try to address this as follows:
 .
 .1/ Install a new version of R when it is released, and packages in the R
 .version's site-library with package versions as available at the time
 .the R version is installed. Only upgrade these package versions in the
 .case they are severely broken.
 .
 .2/ Install the same packages in site-library-fresh and upgrade these
 .package versions on a regular basis (e.g. daily).
 .
 .3/ When a new version of R is released, freeze but do not remove the old
 .R version, at least not for a fairly long time, and freeze
 .site-library-fresh for the old version. Begin with the new version as in
 .1/ and 2/. The old version remains available, so "reverting" is trivial.
 .
 .
 .The analysts are then responsible for choosing the R version they use,
 .and the library they use. This means they do not have to change R and
 .package version mid-project, but they can if they wish. I think the
 .above two libraries will cover most cases, but it is possible that a few
 .projects will need their own special library with a combination of
 .package versions. In this case the user could create their own library,
 .or you might prefer some more official mechanism.
 .
 .The idea of the above strategy is to provide the stability one might
 .want for an ongoing project, and the possibility of an upgraded package
 .if necessary, but not encourage analysts to remain indefinitely with old
 .versions (by say, putting new packages in an old R version library).
 .
 .This strategy has been implemented in a set of make files in the project
 .RoboAdmin available at http://automater.r-forge.r-project.org/. It can
 .be done entirely automatically with a cron job. Constructive comments
 .are always appreciated.
 .
 .(IT departments sometimes think that there should be only one version of
 .everything available, which they test and approve. So the initial
 .reaction to this approach could be negative. I think they have not
 .really thought about the advantages. They usually cannot test/approve an
 .upgrade without user input, and timing is often extremely complicate
 .because of ongoing user needs. This strategy is simply shifting
 .responsibility and timing to the users, or user departments, that can
 .actually do the testing and approving.)
 .
 .Regarding NFS mounts, it is relatively robust. There can be occasional
 .problems, especially for users that have a habit of keeping an R session
 .open for days at a time and using site-library-fresh packages. In my
 .experience this did not happen often enough to worry about a "blackout
 .period".
 .
 .Regarding the original question, I would like to think it could be
 .possible to keep enough information to reproduce the exact environment,
 .but I think for potentially sensitive numerical problems that is
 .optimistic. As others have pointed out, results can depend not only on R
 .and package versions, configuration, OS versions, and library and
 .compiler versions, but also on the underlying hardware. You might have
 .some hope using something like an Amazon core instance. (BTW, this
 .problem is not specific to R.)
 .
 .It is true that restricting to a fixed computing environment at your
 .institution may ease things somewhat, but if you occasionally upgrade
 .hardware or the OS then you will probably lose reproducibility.
 .
 .An alternative that I recommend is that you produce a set of tests that
 .confirm the results of any important project. These can be conveniently
 .put in the tests/ directory of an R package, which is then maintained
 .local, not on CRAN, and built/tested whenever a new R and packages are
 .installed. (Tools for this are also available at the above indicated web
 .site.) This approach means that you continue to reproduce the old
 .results, or if not, discover differences/problems in the old or new
 .version of R and/or packages that may be important to you. I have been
 .successfully using a variant of 

Re: [Rd] crossprod(): g77 versus gfortran

2013-03-05 Thread Benjamin Tyner
Thank you Brian. So it sounds like appendix B.6 of

   http://cran.r-project.org/doc/manuals/R-admin.html

should be updated to reflect that g77 is no longer supported; I will
notify c...@r-project.org of this unless you suggest a different recipient.

In any case, I tried again but using gfortran version 4.4.0, and now the
timings are back to what they were under g77.

As for BLAS, was using the one that ships with R, as wanted to keep
things simple for benchmarking purposes.

Thanks again.

> On 05/03/2013 01:45, Benjamin Tyner wrote:
> >/ Hi
> />/
> />/ I've got two builds of R, one using g77 (version 3.4.6) and the other
> />/ using gfortran (version 4.1.2). The two builds are otherwise identical
> />/ as far as I can tell. The one which used g77 performs crossprod()s
> />/ roughly twice as fast as the gfortran one. I'm wondering if this rings a
> />/ bell with anyone, and if so, are you aware of any configure settings
> />/ which will improve the performance when using gfortran. This is on RHEL 5.
> /
> Note that recent versions of R do not build with g77, and have 
> performance improvements in linear algebra.  So please follow the 
> posting guide and give up the 'at a minimum' information requested and a 
> reproducible example.
>
> Also, check what BLAS is in use: an optimized BLAS can make a lot of 
> difference on such simple operations.
>
>
> -- 
> Brian D. Ripley,  ripley at stats.ox.ac.uk 
> 
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
> 
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] crossprod(): g77 versus gfortran

2013-03-05 Thread Simon Urbanek

On Mar 5, 2013, at 10:38 PM, Benjamin Tyner wrote:

> Thank you Brian. So it sounds like appendix B.6 of
> 
>   http://cran.r-project.org/doc/manuals/R-admin.html
> 
> should be updated to reflect that g77 is no longer supported; I will
> notify c...@r-project.org of this unless you suggest a different recipient.
> 

I think you misunderstood - you can use g77 but you will also need F90-capable 
Fortran - typically gfortran - unless you have other source for LAPACK and 
that's what B.6 says explicitly.

Cheers,
Simon




> In any case, I tried again but using gfortran version 4.4.0, and now the
> timings are back to what they were under g77.
> 
> As for BLAS, was using the one that ships with R, as wanted to keep
> things simple for benchmarking purposes.
> 
> Thanks again.
> 
>> On 05/03/2013 01:45, Benjamin Tyner wrote:
>>> / Hi
>> />/
>> />/ I've got two builds of R, one using g77 (version 3.4.6) and the other
>> />/ using gfortran (version 4.1.2). The two builds are otherwise identical
>> />/ as far as I can tell. The one which used g77 performs crossprod()s
>> />/ roughly twice as fast as the gfortran one. I'm wondering if this rings a
>> />/ bell with anyone, and if so, are you aware of any configure settings
>> />/ which will improve the performance when using gfortran. This is on RHEL 
>> 5.
>> /
>> Note that recent versions of R do not build with g77, and have 
>> performance improvements in linear algebra.  So please follow the 
>> posting guide and give up the 'at a minimum' information requested and a 
>> reproducible example.
>> 
>> Also, check what BLAS is in use: an optimized BLAS can make a lot of 
>> difference on such simple operations.
>> 
>> 
>> -- 
>> Brian D. Ripley,  ripley at stats.ox.ac.uk 
>> 
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
>> 
>> University of Oxford, Tel:  +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UKFax:  +44 1865 272595
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel