Re: [Rd] Best practices in developing package: From a single file

2018-01-30 Thread Cook, Malcolm
> >> I am wondering what are the best practices for developing an R
 > >> package. I am aware of Hadley Wickham's best practice
 > >> documentation/book (http://r-pkgs.had.co.nz/).  I recall a couple of
 > >> years ago there were some tools for generating a package out of a
 > >> single file, such as using package.skeleton, but no auto-generated
 > >> documentation. Do you know a way to generate documentation and a
 > >> package out of single R source file, or from an environment?

I think you want to see the approach to generating a skeleton from a single .R 
file presented in:

Simple and sustainable R packaging using inlinedocs 
http://inlinedocs.r-forge.r-project.org/

I have not used it in some time but found it invaluable when I did.

I would be VERY INTERESTED to hear how others feel it has held up.

Joining conversation late,

~malcolm_c...@stowers.org

 > >
 > > Mehmet,
 > >
 > > This list is for development of the R language itself and closely
 > > related tools.  There is a separate list, R-pkg-devel, for development
 > > of packages.
 > >
 > > Since you're here, I'll try to answer your question.
 > >
 > > package.skeleton can create a package from all the R functions in a
 > > specified environment.  So if you load all the functions that you want
 > > in your new package into your R environment, then call
 > > package.skeleton, you'll have a starting point.
 > >
 > > At that point, I would probably recommend moving to RStudio, and using
 > > RStudio to generate markdown comments for roxygen for all your newly
 > > created function files.  Then you could finish off the documentation by
 > > writing it in these roxygen skeletons or copying and pasting from
 > > comments in your original code files.
 > 
 > I'd agree about moving to RStudio, but I think Roxygen is the wrong
 > approach for documentation.  package.skeleton() will have done the
 > boring mechanical part of setting up your .Rd files; all you have to do
 > is edit some content into them.  (Use prompt() to add a new file if you
 > add a new function later, don't run package.skeleton() again.)
 > 
 > This isn't the fashionable point of view, but I think it is easier to
 > get good documentation that way than using Roxygen.  (It's easier to get
 > bad documentation using Roxygen, but who wants that?)
 > 
 > The reason I think this is that good documentation requires work and
 > thought.  You need to think about the markup that will get your point
 > across, you need to think about putting together good examples, etc.
 > This is *harder* in Roxygen than if you are writing Rd files, because
 > Roxygen is a thin front end to produce Rd files from comments in your .R
 > files.  To get good stuff in the help page, you need just as much work
 > as in writing the .Rd file directly, but then you need to add another
 > layer on top to put in in a comment.  Most people don't bother.
 > 
 > I don't know any packages with what I'd consider to be good
 > documentation that use Roxygen.  It's just too easy to write minimal
 > documentation that passes checks, so Roxygen users don't keep refining it.
 > 
 > (There are plenty of examples of packages that write bad documentation
 > directly to .Rd as well.  I just don't know of examples of packages with
 > good documentation that use Roxygen.)
 > 
 > Based on my criticism last week of git and Github, I expect to be called
 > a grumpy old man for holding this point of view.  I'd actually like to
 > be proven wrong.  So to anyone who disagrees with me:  rather than just
 > calling me names, how about some examples of Roxygen-using packages
 > that
 > have good help pages with good explanations, and good examples in them?
 > 
 > Back to Mehmet's question:  I think Hadley's book is pretty good, and
 > I'd recommend most of it, just not the Roxygen part.
 > 
 > Duncan Murdoch
 > 
 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] running R with users home dirs on a shared filesystems

2019-12-13 Thread Cook, Malcolm
Another thing to avoid are having multiple processes simultaneously access 
single sqlite3 database stored on NFS mount.

From sqlite manual: “Your best defense is to not use SQLite for files on a 
network filesystem”

So, if you configuring RStudio Server, make sure to follow advice about RStudio 
Package Manager: “This 
location must exist on local storage”

And any package that uses sqlite “under the hood” will similarly want the db on 
local storage to avoid such issues stemming from multi-process access.

Cheers,
Malcolm

From: R-devel  On Behalf Of Simon Urbanek
Sent: Friday, December 13, 2019 12:52 PM
To: lejeczek 
Cc: r-devel 
Subject: Re: [Rd] running R with users home dirs on a shared filesystems

CAUTION: This email was received from an External Source


User home is not used by R directly, so it is really up to whatever 
package/code may be using user home. In our setup we have all machines using 
NFS mounted homes for years. From experience the only thing to watch for are 
packages that use their own cache directories in $HOME instead of tempdir() - 
it is technically against CRAN policies but we have seen it in the wild.

Cheers,
Simon



> On Dec 13, 2019, at 1:36 PM, lejeczek via R-devel 
> mailto:r-devel@r-project.org>> wrote:
>
> Hi guys,
>
> I want to ask devel for who knows better - having multiple
> nodes serving users home dirs off the same shared network
> filesystem : are there any precautions or must-dos &
> must-donts in order to assure healthy and efficient parallel
> Rs running simultaneously - and I don't mean obvious stuff,
> I'm rather asking about R's internals & environment.
>
> simple example: three nodes mount a NFS share and users on
> all three nodes run R simultaneously.
>
> many thanks, L.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] using R as SHELL in gnu make

2011-09-19 Thread Cook, Malcolm
I am intrigued by the possibility of using R as the SHELL in a (Gnu) makefile 
(instead of /bin/sh).  (c.f. 
http://www.gnu.org/software/make/manual/make.html#Choosing-the-Shell)

Well, rather, I would like the makefile's SHELL to be a command which 
communicated with an R process.

The makefile targets/prerequistes would, as always, be OS files, which would be 
written/read using standard R file IO.

The makefile's "recipe"s would be written in R (instead of the usual shell).

The R process would be able to be initiated by `load`ing one or more R 
datasets, libraries or entire images.

The R process would be able to accumulate state as the makefile progressed.  
The recipe's would be able to refer to that state, allowing conditional 
execution.

The R process would optionally be saved as an image of on job 
termination/completion.

The R process might be managed using the RServe package, and would need to be 
initiated once only, when the makefile was first invoked.

I would appreciate learning if anyone had any success, informative failures, or 
other lore that may help in (or dissuade me from) embarking on attempt this.

Thanks,

Malcolm Cook
Stowers Institute for Medical Research

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] using R as SHELL in gnu make

2011-09-22 Thread Cook, Malcolm
OK,

I now have a working version of setting SHELL=R in a GNU make script, allowing 
make’s recipes to be written in R, that works out most of the complications.

I plan to clean it up a little more and blog about it soon here: 
http://malcook-gedanken.blogspot.com/ - I’ll follow up here  when I do.

The approach optionally allows evaluating the recipes using a running Rserve 
(http://rosuda.org/Rserve/), avoiding initialization time and allowing 
pre-loading of R libraries common to multiple recipes.

The approach however does NOT provide any special mechanism to preserve state 
between recipes, 
Rather, recipes may create the make rule’s target as a state dump by setting 
with `file=’$@’`  in a call to `save` (or `save.image`, `dump`, as desired).  
Other make rules may then call `load('$<')` when the previously saved dump is a 
pre-requisite to the rule.

I’m still not sure if it is not more of an amusement and am interested in all 
thoughts on this, and welcome any suggestions for example applications that I 
might include when I go to blog it up...

Cheers,

~Malcolm


-Original Message-
From: Paul Gilbert [mailto:pgilb...@bank-banque-canada.ca] 
Sent: Tuesday, September 20, 2011 8:32 AM
To: Cook, Malcolm; 'help-m...@gnu.org'; 'r-devel@r-project.org'
Subject: RE: using R as SHELL in gnu make

Other than the RServe part, I do this all the time. It works well. Perhaps we 
can put together some notes off-line and then bring it back to the list.

Paul

> -Original Message-
> From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
> project.org] On Behalf Of Cook, Malcolm
> Sent: September 19, 2011 6:35 PM
> To: 'help-m...@gnu.org'; 'r-devel@r-project.org'
> Subject: [Rd] using R as SHELL in gnu make
> 
> I am intrigued by the possibility of using R as the SHELL in a (Gnu)
> makefile (instead of /bin/sh).  (c.f.
> http://www.gnu.org/software/make/manual/make.html#Choosing-the-Shell)
> 
> Well, rather, I would like the makefile's SHELL to be a command which
> communicated with an R process.
> 
> The makefile targets/prerequistes would, as always, be OS files, which
> would be written/read using standard R file IO.
> 
> The makefile's "recipe"s would be written in R (instead of the usual
> shell).
> 
> The R process would be able to be initiated by `load`ing one or more R
> datasets, libraries or entire images.
> 
> The R process would be able to accumulate state as the makefile
> progressed.  The recipe's would be able to refer to that state,
> allowing conditional execution.
> 
> The R process would optionally be saved as an image of on job
> termination/completion.
> 
> The R process might be managed using the RServe package, and would need
> to be initiated once only, when the makefile was first invoked.
> 
> I would appreciate learning if anyone had any success, informative
> failures, or other lore that may help in (or dissuade me from)
> embarking on attempt this.
> 
> Thanks,
> 
> Malcolm Cook
> Stowers Institute for Medical Research
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] BUG: emacs orgmode ob-R.el function org-babel-R-evaluate-session over aggressively performs "; ; cleanup extra prompts left in output" and a possible workaround

2015-10-01 Thread Cook, Malcolm
Hello ,

I am not sure what the best solution is, but, in my hands using Org-mode 
version 8.3.2-elpa org-20150929 the reg-expt  used to "cleanup extra prompts 
left in output" is over-aggressive and will trim session :output at lines 
consisting exclusively of blanks and periods such as produced when printing a 
BioConductor 'Views' object which wants to appear as

#+RESULTS:
#+begin_example
  Views on a 23011544-letter DNAString subject
subject: 
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCACTCTCCCATATTATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTGGCAACCCTGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATA...TATCTTTCAAAGATGACACTAGCAATGCGTTAACCCAAATAATGATTTCCCTAAATCCTTCCGTAAATATTAACTGGCTCCACCCAAATTTCGGTCATTAAATAATCAGTTGCCACAACTAATTGTCTGTGGAATGTCATATCTCGATGAGCTCATAATTAAATTTACAATCAAACTGTGTTCGAGAGCTAACATTTGGCATATTTGCAAAGATGAACCTTTCAAA
views:
 start  end width
  [1]   344766   344773 8 [CATGAGGC]
  [2]   563564   563571 8 [CATGAGGC]
  [3]   641027   641034 8 [CATGAGGC]
  [4]   656168   656175 8 [CATGAGGC]
  [5]   709112   709119 8 [CATGAGGC]
  ...  ...  ...   ... ...
[141] 22209984 22209991 8 [CATGAGGC]
[142] 22371543 22371550 8 [CATGAGGC]
[143] 22554991 22554998 8 [CATGAGGC]
[144] 22618578 22618585 8 [CATGAGGC]
[145] 22897728 22897735 8 [CATGAGGC]
#+end_example

But alas rather appears as:

#+RESULTS:
#+begin_example
  ...  ...  ...   ... ...
[141] 22209984 22209991 8 [CATGAGGC]
[142] 22371543 22371550 8 [CATGAGGC]
[143] 22554991 22554998 8 [CATGAGGC]
[144] 22618578 22618585 8 [CATGAGGC]
[145] 22897728 22897735 8 [CATGAGGC]
#+end_example


I offer as a possible workaround the following:

So far, I have had good success having removed provision for allowing leading 
whitespace by changing the regexp  org-babel-R-evaluate-session  from
  "^\\([ ]*[>+\\.][ ]?\\)+\\([[0-9]+\\|[ ]\\)"
 to
  "^\\([>+\\.][ ]?\\)+\\([[0-9]+\\|[ ]\\)"

But I don't know all the test cases so, YMMV

HTH,

~Malcolm (who, FWIW, has never really liked the way ob-R communicated with the 
inferior R session in the first place)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error generated by .Internal(nchar) disappears when debugging

2015-10-07 Thread Cook, Malcolm
What other packages do you have loaded?  Perhaps a BioConductor one that loads 
S4Vectors that announces upon load:

Creating a generic function for 'nchar' from package 'base' in package 
'S4Vectors'

Maybe a red herring...

~Malcolm

 > -Original Message-
 > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Duncan
 > Murdoch
 > Sent: Monday, October 05, 2015 6:57 PM
 > To: Matt Dowle ; r-de...@stat.math.ethz.ch
 > Subject: Re: [Rd] Error generated by .Internal(nchar) disappears when
 > debugging
 > 
 > On 05/10/2015 7:24 PM, Matt Dowle wrote:
 > > Joris Meys  gmail.com> writes:
 > >
 > >>
 > >> Hi all,
 > >>
 > >> I have a puzzling problem related to nchar. In R 3.2.1, the internal
 > > nchar
 > >> gained an extra argument (see
 > >> https://stat.ethz.ch/pipermail/r-announce/2015/000586.html)
 > >>
 > >> I've been testing code using the package copula, and at home I'm
 > >> still running R 3.2.0 (I know, I know...). When trying the following
 > >> code, I
 > > got
 > >> an error:
 > >>
 > >>> library(copula)
 > >>> fgmCopula(0.8)
 > >> Error in substr(sc[i], 2, nchar(sc[i]) - 1) :
 > >>   4 arguments passed to .Internal(nchar) which requires 3
 > >>
 > >> Cheers
 > >> Joris
 > >
 > >
 > > I'm seeing a similar problem. IIUC, the Windows binary .zip from CRAN
 > > of any package using base::nchar is affected. Could someone check my
 > > answer here is correct please :
 > > http://stackoverflow.com/a/32959306/403310
 > 
 > Nobody has posted a simple reproducible example here, so it's kind of hard to
 > say.
 > 
 > I would have guessed that a change to the internal signature of the C code
 > underlying nchar() wouldn't have any effect on a package that called the R
 > nchar() function.
 > 
 > When I put together my own example (a tiny package containing a function
 > calling nchar(), built to .zip using R 3.2.2, installed into R 3.2.0), it 
 > confirmed
 > my guess.
 > 
 > On the other hand, if some package is calling the .Internal function 
 > directly, I'd
 > expect that to break.  Packages shouldn't do that.
 > 
 > So I'd say there's been no evidence posted of a problem in R here, though
 > there may be problems in some of the packages involved.  I'd welcome an
 > example that provided some usable evidence.
 > 
 > Duncan Murdoch
 > 
 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error generated by .Internal(nchar) disappears when debugging

2015-10-07 Thread Cook, Malcolm
OK – definitely a red herring – thanks for following up…

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, October 07, 2015 4:09 PM
To: Cook, Malcolm 
Cc: Duncan Murdoch ; Matt Dowle 
; r-de...@stat.math.ethz.ch
Subject: Re: [Rd] Error generated by .Internal(nchar) disappears when debugging

Malcolm,

I tested the code on a clean R 3.2.0 session. Not even in RStudio, just to rule 
that out.

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] copula_0.999-13

loaded via a namespace (and not attached):
 [1] Matrix_1.2-0 ADGofTest_0.3tools_3.2.0  pspline_1.0-17
 [5] gsl_1.9-10   mvtnorm_1.0-3grid_3.2.0   stats4_3.2.0
 [9] lattice_0.20-31  stabledist_0.7-0 fortunes_1.5-2


On Wed, Oct 7, 2015 at 9:52 PM, Cook, Malcolm 
mailto:m...@stowers.org>> wrote:
What other packages do you have loaded?  Perhaps a BioConductor one that loads 
S4Vectors that announces upon load:

Creating a generic function for 'nchar' from package 'base' in package 
'S4Vectors'

Maybe a red herring...

~Malcolm

 > -Original Message-
 > From: R-devel 
 > [mailto:r-devel-boun...@r-project.org<mailto:r-devel-boun...@r-project.org>] 
 > On Behalf Of Duncan
 > Murdoch
 > Sent: Monday, October 05, 2015 6:57 PM
 > To: Matt Dowle mailto:mattjdo...@gmail.com>>; 
 > r-de...@stat.math.ethz.ch<mailto:r-de...@stat.math.ethz.ch>
 > Subject: Re: [Rd] Error generated by .Internal(nchar) disappears when
 > debugging
 >
 > On 05/10/2015 7:24 PM, Matt Dowle wrote:
 > > Joris Meys  gmail.com<http://gmail.com>> writes:
 > >
 > >>
 > >> Hi all,
 > >>
 > >> I have a puzzling problem related to nchar. In R 3.2.1, the internal
 > > nchar
 > >> gained an extra argument (see
 > >> https://stat.ethz.ch/pipermail/r-announce/2015/000586.html)
 > >>
 > >> I've been testing code using the package copula, and at home I'm
 > >> still running R 3.2.0 (I know, I know...). When trying the following
 > >> code, I
 > > got
 > >> an error:
 > >>
 > >>> library(copula)
 > >>> fgmCopula(0.8)
 > >> Error in substr(sc[i], 2, nchar(sc[i]) - 1) :
 > >>   4 arguments passed to .Internal(nchar) which requires 3
 > >>
 > >> Cheers
 > >> Joris
 > >
 > >
 > > I'm seeing a similar problem. IIUC, the Windows binary .zip from CRAN
 > > of any package using base::nchar is affected. Could someone check my
 > > answer here is correct please :
 > > http://stackoverflow.com/a/32959306/403310
 >
 > Nobody has posted a simple reproducible example here, so it's kind of hard to
 > say.
 >
 > I would have guessed that a change to the internal signature of the C code
 > underlying nchar() wouldn't have any effect on a package that called the R
 > nchar() function.
 >
 > When I put together my own example (a tiny package containing a function
 > calling nchar(), built to .zip using R 3.2.2, installed into R 3.2.0), it 
 > confirmed
 > my guess.
 >
 > On the other hand, if some package is calling the .Internal function 
 > directly, I'd
 > expect that to break.  Packages shouldn't do that.
 >
 > So I'd say there's been no evidence posted of a problem in R here, though
 > there may be problems in some of the packages involved.  I'd welcome an
 > example that provided some usable evidence.
 >
 > Duncan Murdoch
 >
 > __
 > R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org<mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be<mailto:joris.m...@ugent.be>
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] should `data` respect default.stringsAsFactors()?

2016-02-18 Thread Cook, Malcolm
Hiya,

Probably been debated elsewhere

I note that R's `data` function does not respect default.stringsAsFactors 

By my lights, it should, especially as it is documented to call read.table, 
which DOES respect.

Oh, but:  
http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-tp921891p921893.html  

Compelling.  I have to agree.

So, I change my mind.  

By my lights, `data` should then be documented to NOT respect 
default.stringsAsFactors.

Else?

~Malcolm Cook

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should `data` respect default.stringsAsFactors()?

2016-02-18 Thread Cook, Malcolm
Hi Peter,

Sorry if I was not clear.  Perhaps an example will make my point:

> data(iris)
> class(iris$Species)
[1] "factor"
> write.table(iris,'data/myiris.tab')
> data(myiris)
> class(myiris$Species)
[1] "factor"
> rm(myiris)
> options(stringsAsFactors = FALSE)
> data(myiris)
> class(myiris$Species)
[1] "factor"
> myiris<-read.table("data/myiris.tab",header=TRUE)
> class(myiris$Species)
[1] "character"

I am surprised to find that in the above
  setting the global option stringsAsFactors = FALSE does NOT effect 
how Species is being read in by the `data` function
whereas
setting the global option stringsAsFactors = FALSE DOES effect how 
Species is being read in by read.table

especially since data is documented as calling read.table.

In my opinion, one or the other should change (the behavior of data, or the 
documentation).

 ,

~ Malcolm


 > -Original Message-
 > From: peter dalgaard [mailto:pda...@gmail.com]
 > Sent: Thursday, February 18, 2016 3:32 PM
 > To: Cook, Malcolm 
 > Cc: r-de...@stat.math.ethz.ch
 > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
 > 
 > What the  are you on about? data() does many things, only some of
 > which call read.table() et al., and the ones that do have no special 
 > treatment
 > of stringsAsFactors.
 > 
 > -pd
 > 
 > > On 18 Feb 2016, at 21:25 , Cook, Malcolm  wrote:
 > >
 > > Hiya,
 > >
 > > Probably been debated elsewhere
 > >
 > > I note that R's `data` function does not respect default.stringsAsFactors
 > >
 > > By my lights, it should, especially as it is documented to call read.table,
 > which DOES respect.
 > >
 > > Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
 > tp921891p921893.html
 > >
 > > Compelling.  I have to agree.
 > >
 > > So, I change my mind.
 > >
 > > By my lights, `data` should then be documented to NOT respect
 > default.stringsAsFactors.
 > >
 > > Else?
 > >
 > > ~Malcolm Cook
 > >
 > > __
 > > R-devel@r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > 
 > --
 > Peter Dalgaard, Professor,
 > Center for Statistics, Copenhagen Business School
 > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 > Phone: (+45)38153501
 > Office: A 4.23
 > Email: pd@cbs.dk  Priv: pda...@gmail.com
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should `data` respect default.stringsAsFactors()?

2016-02-19 Thread Cook, Malcolm
Joshua,

> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm  wrote:
 > > Hi Peter,
 > >
 > > Sorry if I was not clear.  Perhaps an example will make my point:
 > >
 > >> data(iris)
 > >> class(iris$Species)
 > > [1] "factor"
 > >> write.table(iris,'data/myiris.tab')
 > >> data(myiris)
 > >> class(myiris$Species)
 > > [1] "factor"
 > >> rm(myiris)
 > >> options(stringsAsFactors = FALSE)
 > >> data(myiris)
 > >> class(myiris$Species)
 > > [1] "factor"
 > >> myiris<-read.table("data/myiris.tab",header=TRUE)
 > >> class(myiris$Species)
 > > [1] "character"
 > >
 > > I am surprised to find that in the above
 > >   setting the global option stringsAsFactors = FALSE does NOT 
 > > effect
 > how Species is being read in by the `data` function
 > > whereas
 > > setting the global option stringsAsFactors = FALSE DOES effect how
 > Species is being read in by read.table
 > >
 > > especially since data is documented as calling read.table.
 > >
 > To be explicit, it's documented as calling read.table(..., header =
 > TRUE) in this case, but it actually calls read.table(..., header =
 > TRUE, as.is = FALSE), which results in class(myiris$Species) of
 > "factor".

Aha - makes sense.

 > 
 > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
 > R> class(myiris$Species)
 > [1] "factor"
 > 
 > So it seems like adding as.is = FALSE to the call in the documentation
 > would clear this up.

I agree - thanks for digging into the source - you have unearthed the root 
cause.

~Malcolm

 > > In my opinion, one or the other should change (the behavior of data, or the
 > documentation).
 > >
 > >  ,
 > >
 > > ~ Malcolm
 > >
 > >
 > >  > -Original Message-
 > >  > From: peter dalgaard [mailto:pda...@gmail.com]
 > >  > Sent: Thursday, February 18, 2016 3:32 PM
 > >  > To: Cook, Malcolm 
 > >  > Cc: r-de...@stat.math.ethz.ch
 > >  > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
 > >  >
 > >  > What the  are you on about? data() does many things, only some
 > of
 > >  > which call read.table() et al., and the ones that do have no special
 > treatment
 > >  > of stringsAsFactors.
 > >  >
 > >  > -pd
 > >  >
 > >  > > On 18 Feb 2016, at 21:25 , Cook, Malcolm  wrote:
 > >  > >
 > >  > > Hiya,
 > >  > >
 > >  > > Probably been debated elsewhere
 > >  > >
 > >  > > I note that R's `data` function does not respect 
 > > default.stringsAsFactors
 > >  > >
 > >  > > By my lights, it should, especially as it is documented to call 
 > > read.table,
 > >  > which DOES respect.
 > >  > >
 > >  > > Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
 > >  > tp921891p921893.html
 > >  > >
 > >  > > Compelling.  I have to agree.
 > >  > >
 > >  > > So, I change my mind.
 > >  > >
 > >  > > By my lights, `data` should then be documented to NOT respect
 > >  > default.stringsAsFactors.
 > >  > >
 > >  > > Else?
 > >  > >
 > >  > > ~Malcolm Cook
 > >  > >
 > >  > > __
 > >  > > R-devel@r-project.org mailing list
 > >  > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > >  >
 > >  > --
 > >  > Peter Dalgaard, Professor,
 > >  > Center for Statistics, Copenhagen Business School
 > >  > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 > >  > Phone: (+45)38153501
 > >  > Office: A 4.23
 > >  > Email: pd@cbs.dk  Priv: pda...@gmail.com
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >
 > > __
 > > R-devel@r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > 
 > 
 > 
 > --
 > Joshua Ulrich  |  about.me/joshuaulrich
 > FOSS Trading  |  www.fosstrading.com
 > R/Finance 2016 | www.rinfinance.com
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] should `data` respect default.stringsAsFactors()?

2016-02-19 Thread Cook, Malcolm
Hi,

 > Aha... Hadn't noticed that stringsAsFactors only works via as.is in 
 > read.table.
 > 
 > Yes, the doc should probably be fixed. The code probably not 

Agreed.  

Is someone on-list authorized and willing to make the documentation change?  I 
suppose I could learn what it takes to be a "player", but for such a trivial 
fix, it probably is overkill.  Dissenting opinions?

> -- packages
 > loading different data sets depending on user options is an even worse idea
 > than havíng the option in the first place... (I don't mean having the 
 > possibility, I
 > mean the default.stringsAsFactor thing).
 > 
 > In general, read.table() gets many things wrong

I agree with you that "read.table() gets many things wrong" and I too have my 
favorite workarounds - but that was not my concern.  My concern is that data() 
does not work as documented.

~Malcolm

> , if you don't set switches
 > and/or postprocess. E.g., even when you do intend to read factors, the
 > alphabetical level order is often not desired. My favourite workaround for
 > data() is to drop a corresponding foo.R file in the ./data directory. This 
 > will be
 > run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
 > 
 > dd <- read.table(foo.txt,.)
 > dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
 > 
 > etc.
 > 
 > -pd
 > 
 > 
 > 
 > > On 19 Feb 2016, at 01:39 , Joshua Ulrich  wrote:
 > >
 > > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm 
 > wrote:
 > >> Hi Peter,
 > >>
 > >> Sorry if I was not clear.  Perhaps an example will make my point:
 > >>
 > >>> data(iris)
 > >>> class(iris$Species)
 > >> [1] "factor"
 > >>> write.table(iris,'data/myiris.tab')
 > >>> data(myiris)
 > >>> class(myiris$Species)
 > >> [1] "factor"
 > >>> rm(myiris)
 > >>> options(stringsAsFactors = FALSE)
 > >>> data(myiris)
 > >>> class(myiris$Species)
 > >> [1] "factor"
 > >>> myiris<-read.table("data/myiris.tab",header=TRUE)
 > >>> class(myiris$Species)
 > >> [1] "character"
 > >>
 > >> I am surprised to find that in the above
 > >>  setting the global option stringsAsFactors = FALSE does NOT 
 > >> effect
 > how Species is being read in by the `data` function
 > >> whereas
 > >>setting the global option stringsAsFactors = FALSE DOES effect how
 > Species is being read in by read.table
 > >>
 > >> especially since data is documented as calling read.table.
 > >>
 > > To be explicit, it's documented as calling read.table(..., header =
 > > TRUE) in this case, but it actually calls read.table(..., header =
 > > TRUE, as.is = FALSE), which results in class(myiris$Species) of
 > > "factor".
 > >
 > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
 > > R> class(myiris$Species)
 > > [1] "factor"
 > >
 > > So it seems like adding as.is = FALSE to the call in the documentation
 > > would clear this up.
 > >
 > >> In my opinion, one or the other should change (the behavior of data, or 
 > >> the
 > documentation).
 > >>
 > >>  ,
 > >>
 > >> ~ Malcolm
 > >>
 > >>
 > >>> -Original Message-
 > >>> From: peter dalgaard [mailto:pda...@gmail.com]
 > >>> Sent: Thursday, February 18, 2016 3:32 PM
 > >>> To: Cook, Malcolm 
 > >>> Cc: r-de...@stat.math.ethz.ch
 > >>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
 > >>>
 > >>> What the  are you on about? data() does many things, only some
 > of
 > >>> which call read.table() et al., and the ones that do have no special
 > treatment
 > >>> of stringsAsFactors.
 > >>>
 > >>> -pd
 > >>>
 > >>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm  wrote:
 > >>>>
 > >>>> Hiya,
 > >>>>
 > >>>> Probably been debated elsewhere
 > >>>>
 > >>>> I note that R's `data` function does not respect 
 > >>>> default.stringsAsFactors
 > >>>>
 > >>>> By my lights, it should, especially as it is documented to call 
 > >>>> read.table,
 > >>> which DOES respect.
 > >>>>
 > >>>> Oh

Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-25 Thread Cook, Malcolm
Chiming in late on this thread...

> > | Are there any packages which
 > > | would break if a call to 'tempdir' automatically recreated this
 > > | directory? (Or would it be too much of a performance hit to have
 > > | 'tempdir' check and even just issue a warning when the directory is
 > > | found not to exist?)
 > 
 > > | Should we have a timer which periodically updates
 > > | the modification time of tempdir()? What do other long-running
 > > | programs do (e.g. screen, emacs)?
 > 
 > Valid questions, in my view.  Before answering, let's try to see
 > how hard it would be to make the tempdir() function in R more versatile.

Might this combination serve the purpose: 
* R session keeps an open handle on the tempdir it creates, 
* whatever tempdir harvesting cron job the user has be made sensitive 
enough not to delete open files (including open directories)

 > 
 > As I've found it is not at all hard to add an option which
 > checks the existence and if the directory is no longer "valid",
 > tries to recreate it (and if it fails doing that it calls the
 > famous R_Suicide(), as it does when R starts up and tempdir()
 > cannot be initialized correctly).
 > 
 > The proposed entry in NEWS is
 > 
 >• tempdir(check=TRUE) recreates the tmpdir() if it is no longer valid.
 > 
 > and of course the default would be status quo, i.e.,  check = FALSE,
 > and once this is in R-devel, we (those who install R-devel) can
 > experiment with it.
 > 
 > Martin
 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tempdir() may be deleted during long-running R session

2017-04-25 Thread Cook, Malcolm
> Martin,
 > 
 > Thanks for your work on this.
 > 
 > One thing that seems to be missing from the conversation is that recreating
 > the temp directory will prevent future failures when R wants to write a
 > temp file, but the files will, of course, not be there. Any code written
 > assuming the contract is that the temporary directory, and thus temporary
 > files, will not be cleaned up before the R process exits (which was my
 > naive assumption before this thread, and is the behavior AFAICT on all the
 > systems I regularly use) will still break.
 > 

That is the kind of scenario I was hoping to obviate with my suggestion...

 > I'm not saying that's necessarily fixable (though the R keeping a permanent
 > pointer to a file in the dir suggested by Malcom might? fix it.), 

(and, FWIW, that's "Malcolm" with two "l"s.  I think all those missing "l"s are 
flattened out versions of all the extra close parens I typed in the 80s that 
somehow got lost on the nets...)))

> but I
 > would argue if it IS fixable, a fix that includes that would be preferable.

Agreed!

 > 
 > Best,
 > ~G
 > 
 > On Tue, Apr 25, 2017 at 8:53 AM, Martin Maechler
 >  > wrote:
 > 
 > > > Jeroen Ooms 
 > > > on Tue, 25 Apr 2017 15:05:51 +0200 writes:
 > >
 > > > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
 > > >  wrote:
 > > >> As I've found it is not at all hard to add an option
 > > >> which checks the existence and if the directory is no
 > > >> longer "valid", tries to recreate it (and if it fails
 > > >> doing that it calls the famous R_Suicide(), as it does
 > > >> when R starts up and tempdir() cannot be initialized
 > > >> correctly).
 > >
 > > > Perhaps this can also fix the problem with mcparallel
 > > > deleting the tempdir() when one of its children dies:
 > >
 > >>   file.exists(tempdir()) #TRUE
 > >>   parallel::mcparallel(q('no'))
 > >>   file.exists(tempdir()) # FALSE
 > >
 > > Thank you, Jeroen, for the extra example.
 > >
 > > I now have comitted the new feature... (completely back
 > > compatible: in R's code tempdir() is not yet called with an
 > > argument and the default is  check = FALSE ),
 > > actually in a "suicide-free" way ...  which needed only slightly
 > > more code.
 > >
 > > In the worst case, one could save the R session by
 > >Sys.setenv(TEMPDIR = "")
 > > if for instance /tmp/ suddenly became unwritable for the user.
 > >
 > > What we could consider is making the default of 'check' settable
 > > by an option, and experiment with setting the option to TRUE, so
 > > all such problems would be auto-solved (says the incurable optimist ...).
 > >
 > > Martin
 > >
 > > __
 > > R-devel@r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > >
 > 
 > 
 > 
 > --
 > Gabriel Becker, PhD
 > Associate Scientist (Bioinformatics)
 > Genentech Research
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-devel@r-project.org mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem using rJava with parallel::mclapply

2013-11-11 Thread Cook, Malcolm
Karl,

I have the following notes to self that may be pertinent:

options(java.parameters=
 ## Must preceed `library(XLConnect)` in order to prevent "Java
 ## requested System.exit(130), closing R." which happens when
 ## rJava quits R upon trapping INT (control-c), as is done by
 ## XLConnect (and playwith?), below. (c.f.:
 ## https://www.rforge.net/bugzilla/show_bug.cgi?id=237)
 "-Xrs")


~Malcolm



 >-Original Message-
 >From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 >Behalf Of Karl Forner
 >Sent: Monday, November 11, 2013 11:41 AM
 >To: r-devel@r-project.org
 >Cc: Martin Studer
 >Subject: [Rd] problem using rJava with parallel::mclapply
 >
 >Dear all,
 >
 >I got an issue trying to parse excel files in parallel using XLConnect, the
 >process hangs forever.
 >Martin Studer, the maintainer of XLConnect kindly investigated the issue,
 >identified rJava as a possible cause of the problem:
 >
 >This does not work (hangs):
 >library(parallel)
 >require(rJava)
 >.jinit()
 >res <- mclapply(1:2, function(i) {
 >  J("java.lang.Runtime")$getRuntime()$gc()
 >  1
 >  }, mc.cores = 2)
 >
 >but this works:
 >library(parallel)
 >res <- mclapply(1:2, function(i) {
 >  require(rJava)
 >  .jinit()
 >  J("java.lang.Runtime")$getRuntime()$gc()
 >  1
 >}, mc.cores = 2)
 >
 >To cite Martin, it seems to work with mclapply when the JVM process is
 >initialized after forking.
 >
 >Is this a bug or a limitation of rJava ?
 >Or is there a good practice for rJava clients to avoid this problem ?
 >
 >Best,
 >Karl
 >
 >P.S.
 >> sessionInfo()
 >R version 3.0.1 (2013-05-16)
 >Platform: x86_64-unknown-linux-gnu (64-bit)
 >
 >locale:
 > [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 > [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 > [7] LC_PAPER=C LC_NAME=C
 > [9] LC_ADDRESS=C   LC_TELEPHONE=C
 >[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 >
 >attached base packages:
 >[1] stats graphics  grDevices utils datasets  methods   base
 >
 >loaded via a namespace (and not attached):
 >[1] tools_3.0.1
 >
 >  [[alternative HTML version deleted]]
 >
 >__
 >R-devel@r-project.org mailing list
 >https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ShortRead::FastqStreamer and parallelization

2014-11-18 Thread Cook, Malcolm
Hi,

I understand ShortRead::FastqStreamer will read chunks in parallel depending on 
the value of ShortRead:::.set_omp_threads

I see this discussed here:  
https://stat.ethz.ch/pipermail/bioc-devel/2013-May/004355.html and nowhere else.

It probably should be documented in ShortRead.

Possibly this has already changed for I am using still R 3.1.0.   I thought I'd 
check.

Oh, and, in my hands/hardware, the value of this FastqStreamer's use of 
srapply's parallelization is negligible, at least if the consumer of successive 
yields is in the main process.  I see that the new bpiterate appears to take 
advantage of yielding in forked processes, which sounds promising.  Is that the 
idea?

Looking forward

Malcolm Cook

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] loading multiple CSV files into a single data frame

2012-05-03 Thread Cook, Malcolm
Victor,

I understand you as follows

The first two columns of the desired combined dataframe are the last two
levels of the pathname to the csv file.

The columns in all the data.csv files are the same, namely, there is 
only
one column, and it is named PERF.

If so, the following should work (on unix)

do.call(rbind,lapply(Sys.glob('results/*/*/data.csv'),function(path)
{within(read.csv(path),{ SIZE<-basename(dirname(path));
ASSOC<-basename(dirname(dirname(path)))})}))


On 5/3/12 4:40 PM, "victor jimenez"  wrote:

>First of all, thank you for the answers. I did not know about zoo.
>However,
>it seems that none approach can do what I exactly want (please, correct me
>if I am wrong).
>
>Probably, it was not clear in my original question. The CSV files only
>contain the performance values. The other two columns (ASSOC and SIZE) are
>obtained from the existing values in the directory tree. So, in my
>opinion,
>none of the proposed solutions would work, unless every single "data.csv"
>file contained all the three columns (ASSOC, SIZE and PERF).
>
>In my case, my experimentation framework basically outputs a CSV with some
>values read from the processor's performance counters (PMCs). For each
>cache size and associativity I conduct an experiment, creating a CSV file,
>and placing that file into its own directory. I could modify the
>experimentation framework, so that it also outputs the cache size and
>associativity, but that may not be ideal in some circumstances and I also
>have a significant amount of old results and I want keep using them
>without
>manually fixing the CSV files.
>
>Has anyone else faced such a situation? Any good solutions?
>
>Thank you,
>Victor
>
>On Thu, May 3, 2012 at 8:54 PM, Gabor Grothendieck
>wrote:
>
>> On Thu, May 3, 2012 at 2:07 PM, victor jimenez 
>> wrote:
>> > Sometimes I have hundreds of CSV files scattered in a directory tree,
>> > resulting from experiments' executions. For instance, giving an
>>example
>> > from my field, I may want to collect the performance of a processor
>>for
>> > several design parameters such as "cache size" (possible values: 2,
>>4, 8
>> > and 16) and "cache associativity" (possible values: direct-mapped,
>>4-way,
>> > fully-associative). The results of all these experiments will be
>>stored
>> in
>> > a directory tree like:
>> >
>> > results
>> >  |-- direct-mapped
>> >  |   |-- 2 -- data.csv
>> >  |   |-- 4 -- data.csv
>> >  |   |-- 8 -- data.csv
>> >  |   |-- 16 -- data.csv
>> >  |-- 4-way
>> >  |   |-- 2 -- data.csv
>> >  |   |-- 4 -- data.csv
>> > ...
>> >  |-- fully-associative
>> >  |   |-- 2 -- data.csv
>> >  |   |-- 4 -- data.csv
>> > ...
>> >
>> > I am developing a package that would allow me to gather all those CSV
>> into
>> > a single data frame. Currently, I just need to execute the following
>> > statement:
>> >
>> > dframe <- gather("results/@ASSOC@/@SIZE@/data.csv")
>> >
>> > and this command returns a data frame containing the columns ASSOC,
>>SIZE
>> > and all the remaining columns inside the CSV files (in my case the
>> > processor performance), effectively loading all the CSV files into a
>> single
>> > data frame. So, I would get something like:
>> >
>> > ASSOC,  SIZE, PERF
>> > direct-mapped,   2, 1.4
>> > direct-mapped,   4, 1.6
>> > direct-mapped,   8, 1.7
>> > direct-mapped, 16, 1.7
>> > 4-way,   2, 1.4
>> > 4-way,   4, 1.5
>> > ...
>> >
>> > I would like to ask whether there is any similar functionality already
>> > implemented in R. If so, there is no need to reinvent the wheel :)
>> > If it is not implemented and the R community believes that this
>>feature
>> > would be useful, I would be glad to contribute my code.
>> >
>>
>> If your csv files all have the same columns and represent time series
>> then read.zoo in the zoo package can read multiple csv files in at
>> once using a single read.zoo command producing a single zoo object.
>>
>> library(zoo)
>> ?read.zoo
>> vignette("zoo-read")
>>
>> Also see the other zoo vignettes and help files.
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] test suites for packages

2012-05-18 Thread Cook, Malcolm
svUnit - is Runit compatible and provides some  IDE integration and report
generation and easy syntax for defining tests.

I find it works a treat, and fits very nicely with my R coding/packaging
style (which also uses inlinedocs for easy package creation).

--Malcolm Cook


On 5/17/12 9:10 AM, "Whit Armstrong"  wrote:

>Can anyone share some opinions on test suites for R packages?
>
>I'm looking at testthat and RUnit. Does anyone have strong opinions on
>either of those.
>
>Any additional packages I should consider?
>
>Thanks,
>Whit
>
>__
>R-devel@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug with mapply() on an S4 object

2012-11-28 Thread Cook, Malcolm
Yes, yes, excellent and great , I am tracking this development with great 
interest.

Am I correct that the implications for BioConductor is the tearing out of the 
Xapply from generics and the expecations that List and descendents would now 
"just work" with {t,mc,mcl,...}apply?  That would be a grand outcome.  

~malcolm_c...@stowers.org


> Hervé Pagès 
> on Tue, 27 Nov 2012 17:03:05 -0800 writes:

> Some formatting issues when copy/pasting the patch in the
> body of the email so I've attached the diff file.

Thank you, Hervé.

I have committed (a slightly simplified version of) your patch
to R-devel (to become 2.16.0).
Backporting to '2.15.2 patched' would be a bit of work
-- mapply's C interface is not .Internal() there --
so I've kept it in R-devel.

> On 11/27/2012 04:56 PM, Hervé Pagès wrote:
>> Hi,
>>
>> Here is a patch for this (against current R-devel). The "caching" of
>> the .Primitive for 'length' is taken from seq_along() C code (in
>> R-devel/src/main/seq.c).
>>
>> hpages@thinkpad:~/svn/R$ svn diff R-devel
>> Index: R-devel/src/main/mapply.c
>> ===
>> --- R-devel/src/main/mapply.c(revision 61172)
>> +++ R-devel/src/main/mapply.c(working copy)

[.]

>> lengths = (R_xlen_t *)  R_alloc(m, sizeof(R_xlen_t));
>> for(i = 0; i < m; i++){
>> -lengths[i] = xlength(VECTOR_ELT(varyingArgs, i));
>> +int dispatch_ok = 0;
>> +tmp1 = VECTOR_ELT(varyingArgs, i);
>> +if (isObject(tmp1)) {
>> +/* Looks like DispatchOrEval() needs a pairlist. We reproduce 
what
>> +   pairlist(tmp1) would do i.e. tmp2 <- as.pairlist(list(tmp1)).
>> +   Is there a more direct way to go from tmp1 to tmp2? */

indeed, there is a more direct way:

tmp2 = lang1(tmp1)

and that's what I've used in the commit.

>> +PROTECT(tmp2 = allocVector(VECSXP, 1));
>> +SET_VECTOR_ELT(tmp2, 0, tmp1);
>> +PROTECT(tmp2 = coerceVector(tmp2, LISTSXP));
>> +dispatch_ok = DispatchOrEval(call, length_op, "length",
>> + tmp2, rho, &ans, 0, 1);
>> +UNPROTECT(2);
>> +}
>> +lengths[i] = dispatch_ok ? asInteger(ans) : xlength(tmp1);
>> if(lengths[i] == 0) zero++;
>> if (lengths[i] > longest) longest = lengths[i];
>> }
>>
>> Hopefully the bug can be fixed. Thanks!

Many thanks to you, Hervé!
Martin


>> On 11/14/2012 09:42 PM, Hervé Pagès wrote:
>>> Hi,
>>>
>>> Starting with ordinary vectors, so we know what to expect:
>>>
>>> > mapply(function(x, y) {x * y}, 101:106, rep(1:3, 2))
>>> [1] 101 204 309 104 210 318
>>>
>>> > mapply(function(x, y) {x * y}, 101:106, 1:3)
>>> [1] 101 204 309 104 210 318
>>>
>>> Now with an S4 object:
>>>
>>> setClass("A", representation(aa="integer"))
>>> a <- new("A", aa=101:106)
>>>
>>> > length(a)
>>> [1] 1
>>>
>>> Implementing length():
>>>
>>> setMethod("length", "A", function(x) length(x@aa))
>>>
>>> Testing length():
>>>
>>> > length(a)  # sanity check
>>> [1] 6
>>>
>>> No [[ yet for those objects so the following error is expected:
>>>
>>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
>>> Error in dots[[1L]][[1L]] : this S4 class is not subsettable
>>>
>>> Implementing [[:
>>>
>>> setMethod("[[", "A", function(x, i, j, ...) x@aa[[i]])
>>>
>>> Testing [[:
>>>
>>> > a[[1]]
>>> [1] 101
>>> > a[[5]]
>>> [1] 105
>>>
>>> Trying mapply again:
>>>
>>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
>>> [1] 101 202 303 101 202 303
>>>
>>> Wrong. It looks like internally a[[1]] is always used instead of a[[i]].
>>>
>>> The real problem it seems is that 'a' is treated as if it was of
>>> length 1:
>>>
>>> > mapply(function(x, y) {x * y}, a, 1:3)
>>> [1] 101 202 303
>>> > mapply(function(x, y) {x * y}, a, 5)
>>> [1] 505
>>>
>>> In other words, internal dispatch works for [[ but not for length().
>>>
>>> Thanks,
>>> H.
>>>
>>

> --
> Hervé Pagès

> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024

> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

> [DELETED ATTACHMENT external: mapply.diff, text/x-patch]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] library(tcltk) v. SIGPIPE BUG (?!?)

2012-12-11 Thread Cook, Malcolm
Hi R-devel, tcltk devel, and sqldf devel,

The transcript below shows how loading the tcl/tk library in under R causes 
subprocesses to ignore SIGPIPE.

I am including the developer of the (wonderful) sqldf package since it requires 
tcltk and you might like to make this dependence optional to the user (at least 
until this is fixed in tcltk).

Am I mistaken in calling this a 'bug'?

Any insights appreciated!

Thanks,

Malcolm Cook
Computational Biology - Stowers Institute for Medical Research


> system(intern=TRUE,'yes | head ')
 [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
> library(tcltk)
Loading Tcl/Tk interface ... done
> system(intern=TRUE,'yes | head ')

### this now does not return to the prompt and Looking at 'top' shows that 
'yes' is running until I hit ctrl-c, after which it returns.
C-c C-c
  [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"


> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods   base  
   
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] library(tcltk) v. SIGPIPE BUG (?!?)

2012-12-11 Thread Cook, Malcolm
Excellent, thanks for the workaround, that gets _me_ by, for now.

~Malcolm


> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Tuesday, December 11, 2012 2:40 PM
> To: Cook, Malcolm
> Cc: r-discuss...@listserv.stowers.org; r-devel@r-project.org; 
> phgrosj...@sciviews.org; Blanchette, Marco
> Subject: Re: library(tcltk) v. SIGPIPE BUG (?!?)
> 
> On Tue, Dec 11, 2012 at 3:14 PM, Cook, Malcolm  wrote:
> > Hi R-devel, tcltk devel, and sqldf devel,
> >
> > The transcript below shows how loading the tcl/tk library in under R causes 
> > subprocesses to ignore SIGPIPE.
> >
> > I am including the developer of the (wonderful) sqldf package since it 
> > requires tcltk and you might like to make this dependence
> optional to the user (at least until this is fixed in tcltk).
> >
> > Am I mistaken in calling this a 'bug'?
> >
> > Any insights appreciated!
> >
> > Thanks,
> >
> > Malcolm Cook
> > Computational Biology - Stowers Institute for Medical Research
> >
> >
> >> system(intern=TRUE,'yes | head ')
> >  [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
> >> library(tcltk)
> > Loading Tcl/Tk interface ... done
> >> system(intern=TRUE,'yes | head ')
> >
> > ### this now does not return to the prompt and Looking at 'top' shows that 
> > 'yes' is running until I hit ctrl-c, after which it returns.
> > C-c C-c
> >   [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
> >
> >
> >> sessionInfo()
> > R version 2.15.1 (2012-06-22)
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >
> > locale:
> > [1] C
> >
> > attached base packages:
> > [1] tcltk stats graphics  grDevices utils datasets  methods   
> > base
> >>
> >
> >
> 
> As a workaround specify the "R" engine instead of the "tcl" engine in
> wihch case gsubfn (which is called by sqldf) won't try to use the
> tcltk package:
> 
> options(gsubfn.engine = "R")
> library(sqldf)
> 
> 
> 
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error: 'mcMap' is not an exported object from 'namespace:parallel'

2013-02-14 Thread Cook, Malcolm
> library(parallel)
> mcMap(identity,1:5)
Error: could not find function "mcMap"
> parallel:::mcMap(identity,1:5)
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

> parallel::mcMap(identity,1:5)
Error: 'mcMap' is not an exported object from 'namespace:parallel'

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_US.UTF-8  
  LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8LC_PAPER=C LC_NAME=C 
 LC_ADDRESS=C   LC_TELEPHONE=C 
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods   base  
   


Nuff Said?

Thanks for parallel!

~ malcolm_cook at stowers.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-04 Thread Cook, Malcolm
Hi,

In support of reproducible research at my Institute, I seek an approach to 
re-creating the R environments in which an analysis has been conducted.

By which I mean, the exact version of R and the exact version of all packages 
used in a particular R session.

I am seeking comments/criticism of this as a goal, and of the following outline 
of an approach:

=== When all the steps to an workflow have been finalized ===
* re-run the workflow from beginning to end
* save the results of sessionInfo() into an RDS file named after the current 
date and time.

=== Later, when desirous of exactly recreating this analysis ===
* read the (old) sessionInfo() into an R session
* exit with failure if the running version of R doesn't match
* compare the old sessionInfo to the currently available installed libraries 
(i.e. using packageVersion)
* where there are discrepancies, install the required version of the package 
(without dependencies) into new library (named after the old sessionInfo RDS 
file)

Then the analyst should be able to put the new library into the front of 
.libPaths and run the analysis confident that the same version of the packages.

I have in that past used install-package-version.R  to revert to previous 
versions of R packages successfully (https://gist.github.com/1503736).  And 
there is a similar tool in Hadley Wickhams devtools.

But, I don't know if I need something special for (BioConductor) packages that 
have been installed using biocLite and seek advice here.

I do understand that the R environment is not sufficient to guarantee 
reproducibility.   Some of my colleagues have suggested saving a virtual 
machine with all your software/library/data installed. So, I am also in general 
interested in what other people are doing to this end.  But I am most 
interested in:

* is this a good idea
* is there a worked out solution
* does biocLite introduce special cases
* where do the dragons lurk

... and the like

Any tips?

Thanks,

~ Malcolm Cook
Stowers Institute / Computation Biology / Shilatifard Lab

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
.>>> * where do the dragons lurk
 .>>>
 .>>
 .>> webs of interconnected dynamically loaded libraries, identical versions of
 .>> R compiled with different BLAS/LAPACK options, etc.  Go with the VM if you
 .>> really, truly, want this level of exact reproducibility.
 .>
 .> Sounds like the best bet -- maybe tools like vagrant might be useful here:
 .>
 .> http://www.vagrantup.com
 .>
 .> ... or maybe they're overkill?
 .>
 .> Haven't really checked it out myself too much, my impression is that
 .> these tools (vagrant, chef, puppet) are built to handle such cases.
 .>
 .> I'd imagine you'd probably need a location where you can grab the
 .> precise (versioned) packages for the things you are specifying, but
 .
 .Right...and this is a bit tricky, because we don't keep old versions
 .around in our BioC software repositories.  They are available through
 .Subversion but with the sometimes additional overhead of setting up
 .build-time dependencies.


So, even if I wanted to go where dragons lurked, it would not be possible to 
cobble a version of biocLite that installed specific versions of software.

Thus, I might rather consider an approach that at 'publish' time tarzips up a 
copy of the R package dependencies based on a config file defined from 
sessionInfo and caches it in the project directory.

Then when/if the project is revisited (and found to produce differnt results 
under current R enviRonment),  I can "simply" install an old R (oops, I guess 
I'd have to build it), and then un-tarzip the dependencies into the projects 
own R/Library which I would put on .libpaths.

Or, or?  

(My virtual machine advocating colleagues are snickering now, I am sure..)

Thanks for all your thoughts and advices

--Malcolm

 .
 .
 .> ...
 .>
 .> -steve
 .>
 .> --
 .> Steve Lianoglou
 .> Graduate Student: Computational Systems Biology
 .>  | Memorial Sloan-Kettering Cancer Center
 .>  | Weill Medical College of Cornell University
 .> Contact Info: http://cbio.mskcc.org/~lianos/contact
 .>
 .> __
 .> R-devel@r-project.org mailing list
 .> https://stat.ethz.ch/mailman/listinfo/r-devel
 .
 .___
 .Bioconductor mailing list
 .bioconduc...@r-project.org
 .https://stat.ethz.ch/mailman/listinfo/bioconductor
 .Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
All,

What got me started on this line of inquiry was my attempt at balancing the 
advantages of performing a periodic (daily or weekly) update to the 'release' 
version of locally installed R/Bioconductor packages on our institute-wide 
installation of R with the disadvantages of potentially changing the result of 
an analyst's workflow in mid-project.

I just got the "green light" to institute such periodic updates that I have 
been arguing is in our collective best interest.  In return,  I promised my 
best effort to provide a means for preserving or reverting to a working R 
library configuration.

Please note that the reproducibility I am most eager to provide is limited to 
reproducibility within the computing environment of our institute, which 
perhaps takes away some of the dragon's nests, though certainly not all.

There are technical issues of updating package installations on an NFS mount 
that might have files/libraries open on it from running R sessions.  I am 
interested in learning of approaches for minimizing/eliminating exposure to 
these issue as well.  The first/best approach seems to be to institute a 'black 
out' period when users should expect the installed library to change.   Perhaps 
there are improvements to this

Best,

Malcolm


 .-Original Message-
 .From: Mike Marchywka [mailto:marchy...@hotmail.com]
 .Sent: Tuesday, March 05, 2013 5:24 AM
 .To: amac...@virginia.edu; Cook, Malcolm
 .Cc: r-devel@r-project.org; bioconduc...@r-project.org; 
r-discuss...@listserv.stowers.org
 .Subject: RE: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .
 .I hate to ask what go this thread started but it sounds like someone was 
counting on
 .exact numeric reproducibility or was there a bug in a specific release? In 
actual
 .fact, the best way to determine reproducibility is run the code in a variety 
of
 .packages. Alternatively, you can do everything in java and not assume
 .that calculations commute or associate as the code is modified but it seems
 .pointless. Sensitivity determination would seem to lead to more reprodicible 
results
 .than trying to keep a specific set of code quirks.
 .
 .I also seem to recall that FPU may have random lower order bits in some cases,
 .same code/data give different results. Alsways assume FP is stochastic and 
plan
 .on anlayzing the "noise."
 .
 .
 .
 .> From: amac...@virginia.edu
 .> Date: Mon, 4 Mar 2013 16:28:48 -0500
 .> To: m...@stowers.org
 .> CC: r-devel@r-project.org; bioconduc...@r-project.org; 
r-discuss...@listserv.stowers.org
 .> Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .>
 .> On Mon, Mar 4, 2013 at 4:13 PM, Cook, Malcolm  wrote:
 .>
 .> > * where do the dragons lurk
 .> >
 .>
 .> webs of interconnected dynamically loaded libraries, identical versions of
 .> R compiled with different BLAS/LAPACK options, etc. Go with the VM if you
 .> really, truly, want this level of exact reproducibility.
 .>
 .> An alternative (and arguably more useful) strategy would be to cache
 .> results of each computational step, and report when results differ upon
 .> re-execution with identical inputs; if you cache sessionInfo along with
 .> each result, you can identify which package(s) changed, and begin to hunt
 .> down why the change occurred (possibly for the better); couple this with
 .> the concept of keeping both code *and* results in version control, then you
 .> can move forward with a (re)analysis without being crippled by out-of-date
 .> software.
 .>
 .> -Aaron
 .>
 .> --
 .> Aaron J. Mackey, PhD
 .> Assistant Professor
 .> Center for Public Health Genomics
 .> University of Virginia
 .> amac...@virginia.edu
 .> http://www.cphg.virginia.edu/mackey
 .>
 .> [[alternative HTML version deleted]]
 .>
 .> __
 .> R-devel@r-project.org mailing list
 .> https://stat.ethz.ch/mailman/listinfo/r-devel
 .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
.> So, even if I wanted to go where dragons lurked, it would not be
 .> possible to cobble a version of biocLite that installed specific
 .> versions of software.
 .>
 .> Thus, I might rather consider an approach that at 'publish' time
 .> tarzips up a copy of the R package dependencies based on a config file
 .> defined from sessionInfo and caches it in the project directory.
 .>
 .> Then when/if the project is revisited (and found to produce differnt
 .> results under current R enviRonment),  I can "simply" install an old R
 .> (oops, I guess I'd have to build it), and then un-tarzip the
 .> dependencies into the projects own R/Library which I would put on
 .> .libpaths.
 .
 .Sounds a little like this:
 .
 .http://cran.r-project.org/web/packages/rbundler/index.html
 .
 .(which I haven't tested). Best,
 .
 .Greg.

Looks interesting - thanks for the suggestion.

But, but my use case is one in which an analyst at my site depends upon the 
local library installation and only retrospectively, at some publishable event 
(like handing the results over the in-house customer/scientist), seeks to 
ensure the ability to return to that exact R library environment  later.  This 
tool, on the other hand, commits the user to keep a project specific "bundle" 
from the outset.  Another set of trade-offs.  I will have to synthesize the 
options I am learning.

~ Malcolm 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-05 Thread Cook, Malcolm
Paul,

I think your balanced and reasoned approach addresses all my current concerns.  
Nice!  I will likely adopt your methods.  Let me ruminate.  Thanks for this.

~ Malcolm

 .-Original Message-
 .From: Paul Gilbert [mailto:pgilbert...@gmail.com]
 .Sent: Tuesday, March 05, 2013 4:34 PM
 .To: Cook, Malcolm
 .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 
'r-discuss...@listserv.stowers.org'
 .Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .(More on the original question further below.)
 .
 .On 13-03-05 09:48 AM, Cook, Malcolm wrote:
 .> All,
 .>
 .> What got me started on this line of inquiry was my attempt at
 .> balancing the advantages of performing a periodic (daily or weekly)
 .> update to the 'release' version of locally installed R/Bioconductor
 .> packages on our institute-wide installation of R with the
 .> disadvantages of potentially changing the result of an analyst's
 .> workflow in mid-project.
 .
 .I have implemented a strategy to try to address this as follows:
 .
 .1/ Install a new version of R when it is released, and packages in the R
 .version's site-library with package versions as available at the time
 .the R version is installed. Only upgrade these package versions in the
 .case they are severely broken.
 .
 .2/ Install the same packages in site-library-fresh and upgrade these
 .package versions on a regular basis (e.g. daily).
 .
 .3/ When a new version of R is released, freeze but do not remove the old
 .R version, at least not for a fairly long time, and freeze
 .site-library-fresh for the old version. Begin with the new version as in
 .1/ and 2/. The old version remains available, so "reverting" is trivial.
 .
 .
 .The analysts are then responsible for choosing the R version they use,
 .and the library they use. This means they do not have to change R and
 .package version mid-project, but they can if they wish. I think the
 .above two libraries will cover most cases, but it is possible that a few
 .projects will need their own special library with a combination of
 .package versions. In this case the user could create their own library,
 .or you might prefer some more official mechanism.
 .
 .The idea of the above strategy is to provide the stability one might
 .want for an ongoing project, and the possibility of an upgraded package
 .if necessary, but not encourage analysts to remain indefinitely with old
 .versions (by say, putting new packages in an old R version library).
 .
 .This strategy has been implemented in a set of make files in the project
 .RoboAdmin available at http://automater.r-forge.r-project.org/. It can
 .be done entirely automatically with a cron job. Constructive comments
 .are always appreciated.
 .
 .(IT departments sometimes think that there should be only one version of
 .everything available, which they test and approve. So the initial
 .reaction to this approach could be negative. I think they have not
 .really thought about the advantages. They usually cannot test/approve an
 .upgrade without user input, and timing is often extremely complicate
 .because of ongoing user needs. This strategy is simply shifting
 .responsibility and timing to the users, or user departments, that can
 .actually do the testing and approving.)
 .
 .Regarding NFS mounts, it is relatively robust. There can be occasional
 .problems, especially for users that have a habit of keeping an R session
 .open for days at a time and using site-library-fresh packages. In my
 .experience this did not happen often enough to worry about a "blackout
 .period".
 .
 .Regarding the original question, I would like to think it could be
 .possible to keep enough information to reproduce the exact environment,
 .but I think for potentially sensitive numerical problems that is
 .optimistic. As others have pointed out, results can depend not only on R
 .and package versions, configuration, OS versions, and library and
 .compiler versions, but also on the underlying hardware. You might have
 .some hope using something like an Amazon core instance. (BTW, this
 .problem is not specific to R.)
 .
 .It is true that restricting to a fixed computing environment at your
 .institution may ease things somewhat, but if you occasionally upgrade
 .hardware or the OS then you will probably lose reproducibility.
 .
 .An alternative that I recommend is that you produce a set of tests that
 .confirm the results of any important project. These can be conveniently
 .put in the tests/ directory of an R package, which is then maintained
 .local, not on CRAN, and built/tested whenever a new R and packages are
 .installed. (Tools for this are also available at the above indicated web
 .site.) This approach means that you continue to reproduce the old
 .results, or if not, discover differences/problems in the old or new

Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

2013-03-06 Thread Cook, Malcolm
Thanks David, I've looked into them both a bit, and I don't think the provide 
an approach for R (or Perl, for that matter) library management, which is the 
wicket I'm trying to get less sticky now.

They could be useful to manage the various installations of version of R and 
analysis files (we're talking allot of NextGenSequencing, so, bowtie, tophat, 
and friends) quite nicely similarly in service of an approach to enabling 
reproducible results.

THanks for you thoughts, and, if you know of others similar to dotkit/modules 
I'd be keen to here of them.

~Malcolm


 .-Original Message-
 .From: Lapointe, David [mailto:david.lapoi...@umassmed.edu]
 .Sent: Wednesday, March 06, 2013 7:46 AM
 .To: Cook, Malcolm; 'Paul Gilbert'
 .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 
'r-discuss...@listserv.stowers.org'
 .Subject: RE: [BioC] [Rd] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .There are utilities ( e.g. dotkit, and modules) which facilitate version 
management, basically creating on the fly PATH and env setups, if
 .you are comfortable keeping all that around.
 .
 .David
 .
 .-Original Message-
 .From: bioconductor-boun...@r-project.org 
[mailto:bioconductor-boun...@r-project.org] On Behalf Of Cook, Malcolm
 .Sent: Tuesday, March 05, 2013 6:08 PM
 .To: 'Paul Gilbert'
 .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 
'r-discuss...@listserv.stowers.org'
 .Subject: Re: [BioC] [Rd] enabling reproducible research & R package 
management & install.package.version & BiocLite
 .
 .Paul,
 .
 .I think your balanced and reasoned approach addresses all my current 
concerns.  Nice!  I will likely adopt your methods.  Let me
 .ruminate.  Thanks for this.
 .
 .~ Malcolm
 .
 . .-----Original Message-
 . .From: Paul Gilbert [mailto:pgilbert...@gmail.com]
 . .Sent: Tuesday, March 05, 2013 4:34 PM
 . .To: Cook, Malcolm
 . .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 
'r-discuss...@listserv.stowers.org'
 . .Subject: Re: [Rd] [BioC] enabling reproducible research & R package 
management & install.package.version & BiocLite  .
 . .(More on the original question further below.)  .
 . .On 13-03-05 09:48 AM, Cook, Malcolm wrote:
 . .> All,
 . .>
 . .> What got me started on this line of inquiry was my attempt at  .> 
balancing the advantages of performing a periodic (daily or
 .weekly)  .> update to the 'release' version of locally installed 
R/Bioconductor  .> packages on our institute-wide installation of R with
 .the  .> disadvantages of potentially changing the result of an analyst's  .> 
workflow in mid-project.
 . .
 . .I have implemented a strategy to try to address this as follows:
 . .
 . .1/ Install a new version of R when it is released, and packages in the R  
.version's site-library with package versions as available at the
 .time  .the R version is installed. Only upgrade these package versions in the 
 .case they are severely broken.
 . .
 . .2/ Install the same packages in site-library-fresh and upgrade these  
.package versions on a regular basis (e.g. daily).
 . .
 . .3/ When a new version of R is released, freeze but do not remove the old  
.R version, at least not for a fairly long time, and freeze
 ..site-library-fresh for the old version. Begin with the new version as in  
.1/ and 2/. The old version remains available, so "reverting" is
 .trivial.
 . .
 . .
 . .The analysts are then responsible for choosing the R version they use,  
.and the library they use. This means they do not have to
 .change R and  .package version mid-project, but they can if they wish. I 
think the  .above two libraries will cover most cases, but it is
 .possible that a few  .projects will need their own special library with a 
combination of  .package versions. In this case the user could
 .create their own library,  .or you might prefer some more official mechanism.
 . .
 . .The idea of the above strategy is to provide the stability one might  .want 
for an ongoing project, and the possibility of an upgraded
 .package  .if necessary, but not encourage analysts to remain indefinitely 
with old  .versions (by say, putting new packages in an old R
 .version library).
 . .
 . .This strategy has been implemented in a set of make files in the project  
.RoboAdmin available at http://automater.r-forge.r-
 .project.org/. It can  .be done entirely automatically with a cron job. 
Constructive comments  .are always appreciated.
 . .
 . .(IT departments sometimes think that there should be only one version of  
.everything available, which they test and approve. So
 .the initial  .reaction to this approach could be negative. I think they have 
not  .really thought about the advantages. They usually
 .cannot