Re: [Rd] [Rocks-Discuss] Two R editiosn in Unix cluster systems

2013-10-16 Thread Adam Brenner
Paul,

For our HPC cluster we have ran into this issue in the past. What we use is
modules[1]. We instructor our users to run a command, like

   modules load R/2.15.2

This will load up the environment path in which R/2.15.2 lives. If they want
to switch or use R/3.0.1 they simply run

   module unload R/2.15.2
   module load R/3.0.1

For all our software install, we do *not* install software on each node. The
overhead for us to create a compilation script and fork that out to each node
within our cluster (100+) is not worth it. Instead we use modules, as I have
described above. We use a standard NFS server with lots of NFS processes that
gets mounted on each compute node. This has worked extremely well for us.

The primary reason, is due to the fact the linux kernel does a fairly good job
when caching libraries. In our setup, we have experienced most, if not all,
the R libraries stay in memory once loaded from our NFS server. The data/input
files R uses is on our Gluster or FraunhoferFS parallel file system. Of
course, keeping the data local to the compute node would be the fastest.


If you still want to install software locally on each compute node, you can
still take advantage of modules. I do suggest you install R (from source or
RPM, etc) in a non-standard location like /opt/ or make your own /apps, /data
and so on. Then create a module file similar to the following:

#%Module1.0
module load gcc/4.8.1
set  ROOT  /data/apps/R/3.0.1

prepend-path PATH$ROOT/bin
prepend-path MANPATH $ROOT/share
prepend-path R_LIBS  $ROOT/lib64/R/library
prepend-path LD_LIBRARY_PATH $ROOT/lib64/R/lib


Replace the TCL variable ROOT with the path of where R lives and you are good
to go. This method of works with other software besides R :-)

[1]: http://modules.sourceforge.net

-Adam


--
Adam Brenner
Computer Science, Undergraduate Student
Donald Bren School of Information and Computer Sciences

Research Computing Support
Office of Information Technology
http://www.oit.uci.edu/rcs/

University of California, Irvine
www.ics.uci.edu/~aebrenne/
aebre...@uci.edu


On Tue, Oct 15, 2013 at 1:15 PM, Paul Johnson  wrote:
> Dear R Devel
>
> Some of our R users are still insisting we run R-2.15.3 because of
> difficulties with a package called OpenMX.  It can't cooperate with new R,
> oh well.
>
> Other users need to run R-3.0.1. I'm looking for the most direct route to
> install both, and allow users to choose at runtime.
>
> In the cluster, things run faster if I install RPMs to each node, rather
> than putting R itself on the NFS share (but I could do that if you think
> it's really better)
>
> In the past, I've used the SRPM packaging from EPEL repository to make a
> few little path changes and build R RPM for our cluster nodes. Now I face
> the problem of building 2 RPMS, one for R-2.15.3 and one for R-newest, and
> somehow keeping them separate.
>
> If you were me, how would you approach this?
>
> Here's my guess
>
> First, The RPM packages need unique names, of course.
>
> Second, leave the RPM packaging for R-newest exactly the same as it always
> was.  R is in the path, the R script and references among all the bits will
> be fine, no need to fight. It will find what it needs in /usr/lib64/R or
> whatnot.
>
> For the legacy R, I'm considering 2 ideas.  I could install R with the same
> prefix, /usr, but very careful so the R bits are installed into separate
> places. I just made a fresh build of R and on RedHat 6, it appears to me R
> installs these directories:
> bin
> libdir
> share.
>
> So what if the configure line has the magic bindir=/usr/bin-R-2.15.3
> libdir = /usr/lib64/R-2.15.3, and whatnot. If I were doing Debian
> packaging, I suppose I'd be obligated (by the file system standard) to do
> that kind of thing. But it looks like a headache.
>
> The easy road is to set the prefix at some out of the way place, like
> /opt/R-2.15.3, and then use a post-install script to link
> /opt/R-2/15.3/bin/R to /usr/bin/R-2.15.3.  When I tried that, it surprised
> me because R did not complain about lack access to devel headers. It
> configures and builds fine.
>
> R is now configured for x86_64-unknown-linux-gnu
>
>   Source directory:  .
>   Installation directory:/tmp/R
>
>   C compiler:gcc -std=gnu99  -g -O2
>   Fortran 77 compiler:   gfortran  -g -O2
>
>   C++ compiler:  g++  -g -O2
>   Fortran 90/95 compiler:gfortran -g -O2
>   Obj-C compiler:gcc -g -O2 -fobjc-exceptions
>
>   Interfaces supported:  X11, tcltk
>   External libraries:readline, ICU, lzma
>   Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
>   Options enabled:   shared BLAS, R profiling, Java
>
>   Recommended packages:  yes
>
> Should I worry about any runtime complications of this older R finding its
> of the newer R in the PATH ahead of it? I worry I'm making lazy
> assumptions?

[Rd] randomForest: Numeric deviation between 32/64 Windows builds

2013-10-16 Thread Rosenberger George
Dear R Developers

I'm using the great randomForest package (4.6-7) for many projects and recently 
stumbled upon a problem when I wrote unit tests for one of my projects:

On Windows, there are small numeric deviations when using the 32- / 64-bit 
version of R, which doesn't seem to be a problem on Linux or Mac.

R64 on Windows produces the same results as R64/R32 on Linux or Mac:

> set.seed(131)
> importance(randomForest(Species ~ ., data=iris))
 MeanDecreaseGini
Sepal.Length 9.452470
Sepal.Width  2.037092
Petal.Length43.603071
Petal.Width 44.116904

R32 on Windows produces the following:

> set.seed(131)
> importance(randomForest(Species ~ ., data=iris))
 MeanDecreaseGini
Sepal.Length 9.433986
Sepal.Width  2.249871
Petal.Length43.594159
Petal.Width 43.941870

Is there a reason why this is different for the Windows builds? Are the 
compilers on Windows doing different things for 32- / 64-bit builds than the 
ones on Linux or Mac?

Thank you very much for your help.

Best regards,
George

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] randomForest: Numeric deviation between 32/64 Windows builds

2013-10-16 Thread Prof Brian Ripley

On 15/10/2013 14:00, Rosenberger George wrote:

Dear R Developers

I'm using the great randomForest package (4.6-7) for many projects and recently 
stumbled upon a problem when I wrote unit tests for one of my projects:

On Windows, there are small numeric deviations when using the 32- / 64-bit 
version of R, which doesn't seem to be a problem on Linux or Mac.

R64 on Windows produces the same results as R64/R32 on Linux or Mac:


set.seed(131)
importance(randomForest(Species ~ ., data=iris))

  MeanDecreaseGini
Sepal.Length 9.452470
Sepal.Width  2.037092
Petal.Length43.603071
Petal.Width 44.116904

R32 on Windows produces the following:


set.seed(131)
importance(randomForest(Species ~ ., data=iris))

  MeanDecreaseGini
Sepal.Length 9.433986
Sepal.Width  2.249871
Petal.Length43.594159
Petal.Width 43.941870

Is there a reason why this is different for the Windows builds? Are the 
compilers on Windows doing different things for 32- / 64-bit builds than the 
ones on Linux or Mac?


Yes, no (but these are not R issues).

There are bigger differences in the OS's equivalent of libm on Windows. 
 You did not tell us what CPUs your compilers targeted on Linux and OS 
X (sic), but generally they assume more than the i386 assumed on 32-bit 
Windows by Microsoft.  OTOH, all x86_64 OSes, including Windows, can 
assume more as all such CPUs have post-i686 features.  Remember Windows 
XP is still supported, and that was released in 2001.


Based on much wider experience than you give (e.g. reference results 
from R itself and recommended packages), deviations from x86_64 results 
are increasingly likely on OS X, i686 Linux and then i386 Windows.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Replacing the Random Number Generator in Stand Alone Library

2013-10-16 Thread peter dalgaard

On Oct 16, 2013, at 02:41 , Nigel Delaney wrote:

> Okay, so I am guessing everyone had the same response I initially did when
> hearing that this RNG might not be so hot*...  as an alternate question, if
> I submitted a patch to replace the current RNG with the twister, would it be
> accepted?


Quite possibly. 

I think the reason you get no reply could be that nobody really knows to what 
extent the standalone library is being used, and what repercussions an internal 
change might have. (E.g., if a change in the seed format causes applications to 
crash and burn, users might not appreciate the improved RNG...)

The RNG issues really are serious, and affect actual applications. This was 
hashed out about 10 years ago and eventually fixed somewhere around R 1.6--1.7. 

> RNGversion("1.6.0")
Warning message:
In RNGkind("Marsaglia-Multicarry", "Buggy Kinderman-Ramage") :
  buggy version of Kinderman-Ramage generator used
> s <- replicate(1e6, max(rnorm(100)))
> plot(density(s))
> m <- matrix(runif(8e7),2)
> plot(m[1,],m[2,], xlim=c(0,1e-3), pch=".")

(The bug in the K-R generator isn't relevant here, but the pattern of RNG usage 
in K-R is what creates the ripple effect in the first plot. Changing _either_ 
of the two generators removes the effect.)

-pd


> 
> -N
> 
> -Original Message-
> From: Nigel Delaney [mailto:nigel.dela...@outlook.com] 
> Sent: Thursday, October 10, 2013 5:04 PM
> To: r-devel@r-project.org
> Subject: Replacing the Random Number Generator in Stand Alone Library
> 
> Hi R-Developers,
> 
> I had a question about the random number generator used in the R StandAlone
> Math Library.  The stand-alone library depends on the unif_rand() function
> for most simulated values, and this function is provided in the sunif.c file
> in the relevant directory.  At present, this program implements the
> "Marsaglia-Multicarry" algorithm, which is described throughout the R
> documentation as:
> 
> "A multiply-with-carry RNG is used, as recommended by George Marsaglia in
> his post to the mailing list 'sci.stat.math'. It has a period of more than
> 2^60 and has passed all tests (according to Marsaglia). The seed is two
> integers (all values allowed)."
> 
> However, I do not think this RNG actually passes all tests.   For example,
> the Handbook of Computational Econometrics (illegal web copy at link below),
> shows that it fails the mtuple test and gives an explicit example where it
> leads to problems because it failed this test.  The mtuple test was
> introduced by Marsaglia in 1985, and I gather he wrote his mailing list
> comment that it "passes all tests" sometime after this, so I am not sure
> what explains this distinction (though I am not sure if the mtuple test is
> included in the diehard tests, which he may have been what he was referring
> to).  However, there are clearly some areas where this PRNG runs in to
> trouble (although the books example is better, another problem is that it
> can't seem to simulate a value above (1/2)^1+(1/4)^4) after simulating a
> value below 1e-6.
> 
> Given that the Mersenne Twister seems to be the standard for simulation
> these days (and used as the default in R), it seems like it might be useful
> to change the stand alone library so it also uses this routine.  I gather
> this would be pretty easy to do by pulling this function from the RNG.c file
> and moving it into the sunif.c file, and have a prototype of this.
> 
> However, I thought I would ask, is there a reason this hasn't been done?  Or
> is it just a historical carry-over (pun intended I suppose).
> 
> Warm wishes,
> Nigel
> 
> 
> Research Fellow
> Massachusetts General Hospital / Broad Institute
> 
> 
> Book link:
> http://thenigerianprofessionalaccountant.files.wordpress.com/2013/04/handboo
> k-of-computational-econometrics-belsley.pdf
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Internally accessing ref class methods with .self$x is different from .self[['x']]

2013-10-16 Thread Winston Chang
When a reference class method is accessed with .self$x, it has
different behavior from .self[['x']]. The former copies the function
to the object's environment (with some attributes attached), and the
latter just return NULL (unless it has already been accessed once with
.self$x). Is this how it's supposed to work?

Here's an example that illustrates: https://gist.github.com/wch/7013262

TestClass <- setRefClass(
  'TestClass',
  fields = list(a = 'ANY'),
  methods = list(
initialize = function() {
  e <- environment()
  pe <- parent.env(e)
  ppe <- parent.env(pe)

  # The environment of this object
  print(ls(pe))
  # The environment of the class
  print(ls(ppe))

  # No surprises with fields
  cat(" .self[['a']] \n")
  print(.self[['a']])


  # Getting b these ways isn't quite what we want
  cat(" .self[['b']] \n")
  print(.self[['b']])
  cat(" b \n")
  print(b)


  # Accessing b with $ works, and it changes things from here on in
  cat(" .self$b \n")
  print(.self$b)


  # Now these return the b method with some attributes attached
  cat(" .self[['b']] \n")
  print(.self[['b']])
  cat(" b \n")
  print(b)


  cat("===\n")
  print(ls(parent.env(e)))
  print(ls(parent.env(parent.env(e

},

b = function() {
  "Yes, this is b."
}
  )
)

tc <- TestClass$new()


Output:


[1] "a"  "initialize"
[1] "b" "e" "tc""TestClass"
 .self[['a']] 
An object of class "uninitializedField"
Slot "field":
[1] "a"

Slot "className":
[1] "ANY"

 .self[['b']] 
NULL
 b 
function() {
  "Yes, this is b."
}
 .self$b 
function() {
  "Yes, this is b."
}

attr(,"mayCall")
character(0)
attr(,"name")
[1] "b"
attr(,"refClassName")
[1] "TestClass"
attr(,"superClassMethod")
[1] ""
attr(,"class")
[1] "refMethodDef"
attr(,"class")attr(,"package")
[1] "methods"
 .self[['b']] 
function() {
  "Yes, this is b."
}

attr(,"mayCall")
character(0)
attr(,"name")
[1] "b"
attr(,"refClassName")
[1] "TestClass"
attr(,"superClassMethod")
[1] ""
attr(,"class")
[1] "refMethodDef"
attr(,"class")attr(,"package")
[1] "methods"
 b 
function() {
  "Yes, this is b."
}

attr(,"mayCall")
character(0)
attr(,"name")
[1] "b"
attr(,"refClassName")
[1] "TestClass"
attr(,"superClassMethod")
[1] ""
attr(,"class")
[1] "refMethodDef"
attr(,"class")attr(,"package")
[1] "methods"
===
[1] "a"  "b"  "initialize"
[1] "b" "e" "tc""TestClass"



-Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Internally accessing ref class methods with .self$x is different from .self[['x']]

2013-10-16 Thread Gabriel Becker
Winston,

 (back on list since I found some official info)

Looks like the behavior you are seeing is "documented-ish"

Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method requires a
particular other method, the first method should include a call to
$usingMethods() with the name of the other method as an argument. Declaring
the methods this way is essential if the other method is used indirectly
(e.g., via 
sapply
() or 
do.call
()). If it is called directly, code analysis will find it. Declaring the
method is harmless in any case, however, and may aid readability of the
source code.

Seems like .self$usingMethods() is supposed to allow you to do what you
want, but I wasn't able to get it to work after a few minutes of fiddling
and the actual usingMethods method doesn't seem to do anything on cursory
inspection in a toy example but I don't pretend to know the arcane depths
of the refclass machinery.

~G


On Wed, Oct 16, 2013 at 1:14 PM, Winston Chang wrote:

> Hi Gabriel -
>
> Thanks for your reply. The reason that I'm interested in doing it that
> way is because we're working on a project where we call a method and
> pass it the name of another method, and that method is accessed with $
> or [[.
>
> You might be right about [[ treating it as a standard environment.
> Reading the docs a little more closely, I found:
> "... The corresponding programming mechanism is to invoke a method on
> an object. In the R syntax we use "$" for this operation; one invokes
> a method, m1 say, on an object x by the expression x$m1(...)."
>
> This works as expected, but it is a bit clunky:
> `$`(.self, x)
>
> -Winston
>
>
> On Wed, Oct 16, 2013 at 3:08 PM, Gabriel Becker 
> wrote:
> > Winston,
> >
> > Replying off list as I'm not an expert and don't have time to delve into
> > this enough to actually track it down.
> >
> > Is there a reason to expect [[ to work in this way? I've only ever
> > seen/written code that access fields/methods on reference classes via $
> > (which I assume has extra machinery for RefClasses to deal with active
> > binding functions for fields, etc).
> >
> > My guess would be that [[ is treating the reference class obj as a
> standard
> > environment which is bypassing some extra step necessary for refclass
> > objects and that is causing the problems.
> >
> >
> > I'd say its probably still a bug, but of the "why were you trying to do
> > that?" variety (no offense intended to you at all).
> >
> > All I can really suggest in the interim is to just use $ instead until
> John
> > (Chambers) pops up and responds to your mail.
> >
> > ~G
> >
> >
> > On Wed, Oct 16, 2013 at 12:47 PM, Winston Chang  >
> > wrote:
> >>
> >> When a reference class method is accessed with .self$x, it has
> >> different behavior from .self[['x']]. The former copies the function
> >> to the object's environment (with some attributes attached), and the
> >> latter just return NULL (unless it has already been accessed once with
> >> .self$x). Is this how it's supposed to work?
> >>
> >> Here's an example that illustrates: https://gist.github.com/wch/7013262
> >>
> >> TestClass <- setRefClass(
> >>   'TestClass',
> >>   fields = list(a = 'ANY'),
> >>   methods = list(
> >> initialize = function() {
> >>   e <- environment()
> >>   pe <- parent.env(e)
> >>   ppe <- parent.env(pe)
> >>
> >>   # The environment of this object
> >>   print(ls(pe))
> >>   # The environment of the class
> >>   print(ls(ppe))
> >>
> >>   # No surprises with fields
> >>   cat(" .self[['a']] \n")
> >>   print(.self[['a']])
> >>
> >>
> >>   # Getting b these ways isn't quite what we want
> >>   cat(" .self[['b']] \n")
> >>   print(.self[['b']])
> >>   cat(" b \n")
> >>   print(b)
> >>
> >>
> >>   # Accessing b with $ works, and it changes things from here on in
> >>   cat(" .self$b \n")
> >>   print(.self$b)
> >>
> >>
> >>   # Now these return the b method with some attributes attached
> >>   cat(" .self[['b']] \n")
> >>   print(.self[['b']])
> >>   cat(" b \n")
> >>   print(b)
> >>
> >>
> >>   cat("===\n")
> >>   print(ls(parent.env(e)))
> >>   print(ls(parent.env(parent.env(e
> >>
> >> },
> >>
> >> b = function() {
> >>   "Yes, this is b."
> >> }
> >>   )
> >> )
> >>
> >> tc <- TestClass$new()
> >>
> >>
> >> Output:
> >> 
> >>
> >> [1] "a"  "initialize"
> >> [1] "b" 

Re: [Rd] Internally accessing ref class methods with .self$x is different from .self[['x']]

2013-10-16 Thread Winston Chang
On Wed, Oct 16, 2013 at 3:41 PM, Gabriel Becker  wrote:
> Winston,
>
>  (back on list since I found some official info)
>
> Looks like the behavior you are seeing is "documented-ish"
>
> Only methods actually used will be included in the environment corresponding
> to an individual object. To declare that a method requires a particular
> other method, the first method should include a call to $usingMethods() with
> the name of the other method as an argument. Declaring the methods this way
> is essential if the other method is used indirectly (e.g., via sapply() or
> do.call()). If it is called directly, code analysis will find it. Declaring
> the method is harmless in any case, however, and may aid readability of the
> source code.
>
> Seems like .self$usingMethods() is supposed to allow you to do what you
> want, but I wasn't able to get it to work after a few minutes of fiddling
> and the actual usingMethods method doesn't seem to do anything on cursory
> inspection in a toy example but I don't pretend to know the arcane depths of
> the refclass machinery.
>

I wasn't able to get .self$usingMethods() to work either. I think that
for my case, it still won't do the job - the issue is that I'm calling
a method and passing the name of another method, which is accessed via
[[. Since .self$usingMethods() supposedly analyzes code when the class
is installed (and not at runtime), that wouldn't help in this case.

Previously I said that code like this would work, but I was wrong:
  var <- "someMethod"
  `$`(.self, var)
It doesn't work because $ doesn't evaluate var; it thinks you're
trying to get .self$var, not .self$someMethod.

The workaround we're using for now is:
  do.call(`$`, list(.self, var))

-Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Internally accessing ref class methods with .self$x is different from .self[['x']]

2013-10-16 Thread Gabriel Becker
Winston,

Sounds like you might be better of constructing and evaluating an
expression:

> test = setRefClass("test", methods = list(m1 = function() "hi", m2 =
function() "lo", byname = function(nm) {
+ expr = substitute(.self$xXx(), list(xXx = nm))
+ eval(expr)
+ }))
> thing = test$new()
> thing$byname("m1")
[1] "hi"

HTH,
~G



On Wed, Oct 16, 2013 at 2:12 PM, Winston Chang wrote:

> On Wed, Oct 16, 2013 at 3:41 PM, Gabriel Becker 
> wrote:
> > Winston,
> >
> >  (back on list since I found some official info)
> >
> > Looks like the behavior you are seeing is "documented-ish"
> >
> > Only methods actually used will be included in the environment
> corresponding
> > to an individual object. To declare that a method requires a particular
> > other method, the first method should include a call to $usingMethods()
> with
> > the name of the other method as an argument. Declaring the methods this
> way
> > is essential if the other method is used indirectly (e.g., via sapply()
> or
> > do.call()). If it is called directly, code analysis will find it.
> Declaring
> > the method is harmless in any case, however, and may aid readability of
> the
> > source code.
> >
> > Seems like .self$usingMethods() is supposed to allow you to do what you
> > want, but I wasn't able to get it to work after a few minutes of fiddling
> > and the actual usingMethods method doesn't seem to do anything on cursory
> > inspection in a toy example but I don't pretend to know the arcane
> depths of
> > the refclass machinery.
> >
>
> I wasn't able to get .self$usingMethods() to work either. I think that
> for my case, it still won't do the job - the issue is that I'm calling
> a method and passing the name of another method, which is accessed via
> [[. Since .self$usingMethods() supposedly analyzes code when the class
> is installed (and not at runtime), that wouldn't help in this case.
>
> Previously I said that code like this would work, but I was wrong:
>   var <- "someMethod"
>   `$`(.self, var)
> It doesn't work because $ doesn't evaluate var; it thinks you're
> trying to get .self$var, not .self$someMethod.
>
> The workaround we're using for now is:
>   do.call(`$`, list(.self, var))
>
> -Winston
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel