[Rd] requiring NAMESPACE re-installation marked as old.packages?

2011-07-19 Thread Martin Morgan
It would be convenient if, under R-devel r56422, packages that require 
re-installation because they do not have a NAMESPACE were marked as 
old.packages, so their lack of functionality can be discovered more easily.


> "snow" %in% row.names(old.packages())
[1] FALSE
> library(snow)
Error in library(snow) :
  package 'snow' does not have a NAMESPACE and should be re-installed
> install.packages("snow", repos="http://cran.r-project.org";)
Installing package(s) into 
'/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/2.14'

(as 'lib' is unspecified)
trying URL 'http://cran.r-project.org/src/contrib/snow_0.3-5.tar.gz'
Content type 'application/x-gzip' length 21059 bytes (20 Kb)
opened URL
==
downloaded 20 Kb

* installing *source* package 'snow' ...
** R
** inst
** Creating default NAMESPACE file
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
Warning: running .First.lib() for package 'snow' as .onLoad/.onAssign 
were not found

Error in initDefaultClusterOptions() :
  cannot change value of locked binding for 'defaultClusterOptions'
Error: loading failed
Execution halted
ERROR: loading failed
* removing '/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/2.14/snow'
* restoring previous 
'/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/2.14/snow'


The downloaded packages are in
'/tmp/RtmpoGypnz/downloaded_packages'
Warning message:
In install.packages("snow", repos = "http://cran.r-project.org";) :
  installation of package 'snow' had non-zero exit status

Martin
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (no subject)

2011-07-19 Thread Martin Maechler
> Simon Urbanek 
> on Tue, 28 Jun 2011 11:31:58 -0400 writes:

> On Jun 28, 2011, at 9:58 AM, Michelle.Carey wrote:

>> Hi,
>> 
>> 
>> 
>> I am trying to write code in C for an R package. I need
>> high precision in the form of the mpfr and gmp packages.

> gmp is available as R package
> http://cran.r-project.org/web/packages/gmp/index.html do
> you really need it at a lower level than that?

and so is MPFR:  Package 'Rmpfr'

(both on CRAN and R-forge)

.. and I am asking the same question as Simon:
Are you sure you need to use your own C code, 
instead of simply using the R packages 'gmp' and 'Rmpfr' and
work with R code?

--
Martin Maechler, ETH Zurich


>> I have installed mpfr and gmp under the instructions of
>> the following website
>> http://pauillac.inria.fr/cdrom_a_graver/prog/pc/mpfr/eng.htm
>> and I get no errors. I have put the header files (mpfr.h
>> and gmp.h) in the folder C:\Program
>> Files\R\R-2.13.0\include;

> That doesn't sound right - the instructions you quote
> install in the default location which is presumably
> /usr/local so you should simply set your flags to use that
> location.


>> allowing my c code to identify the header files by
>> incorporating include and include.
>> Unfortunately when I try to include any of the functions
>> listed in the header file I get compile error stating
>> that these functions are undeclared,

> That doesn't sound plausible, either - undeclared function
> don't cause errors (not in C). Undefined functions do and
> you have to make sure you link the libraries installed
> above.


>> even though the source code is in the same directory as
>> my c code that I am trying to compile.

> That bears no meaning - as you said you installed the
> libraries, so you have to use those. Source code of
> gmp/mpfr has no effect on your code and should not be
> anywhere near it.

> Cheers, Simon


>> Any help on this matter would be greatly appreciated.
>> 
>> 
>> Kind Regards,
>> 
>> Michelle Carey

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Manipulating single-precision (float) arrays in .Call functions

2011-07-19 Thread Matthew Dowle

"Prof Brian Ripley"  wrote in message 
news:alpine.lfd.2.02.1107190640280.28...@gannet.stats.ox.ac.uk...
> On Mon, 18 Jul 2011, Alireza Mahani wrote:
>
>> Simon,
>>
>> Thank you for elaborating on the limitations of R in handling float 
>> types. I
>> think I'm pretty much there with you.
>>
>> As for the insufficiency of single-precision math (and hence limitations 
>> of
>> GPU), my personal take so far has been that double-precision becomes 
>> crucial
>> when some sort of error accumulation occurs. For example, in differential
>> equations where boundary values are integrated to arrive at interior 
>> values,
>> etc. On the other hand, in my personal line of work (Hierarchical 
>> Bayesian
>> models for quantitative marketing), we have so much inherent uncertainty 
>> and
>> noise at so many levels in the problem (and no significant error
>> accumulation sources) that single vs double precision issue is often
>> inconsequential for us. So I think it really depends on the field as well 
>> as
>> the nature of the problem.
>
> The main reason to use only double precision in R was that on modern CPUs 
> double precision calculations are as fast as single-precision ones, and 
> with 64-bit CPUs they are a single access.
> So the extra precision comes more-or-less for free.

But, isn't it much more of the 'less free' when large data sets are
considered? If a double matrix takes 3GB, it's 1.5GB in single.
That might alleviate the dreaded out-of-memory error for some
users in some circumstances. On 64bit, 50GB reduces to 25GB
and that might make the difference between getting
something done, or not. If single were appropriate, of course.
For GPU too, i/o often dominates iiuc.

For space reasons, is there any possibility of R supporting single
precision (and single bit logical to reduce memory for logicals by
32 times)? I guess there might be complaints from users using
single inappropriately (or worse, not realising we have an instable
result due to single).

Matthew

> You also under-estimate the extent to which stability of commonly used 
> algorithms relies on double precision.  (There are stable single-precision 
> versions, but they are no longer commonly used.  And as Simon said, in 
> some cases stability is ensured by using extra precision where available.)
>
> I disagree slightly with Simon on GPUs: I am told by local experts that 
> the double-precision on the latest GPUs (those from the last year or so) 
> is perfectly usable.  See the performance claims on 
> http://en.wikipedia.org/wiki/Nvidia_Tesla of about 50% of the SP 
> performance in DP.
>
>>
>> Regards,
>> Alireza
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Manipulating-single-precision-float-arrays-in-Call-functions-tp3675684p3677232.html
>> Sent from the R devel mailing list archive at Nabble.com.
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Welcome Uwe Ligges to R-core

2011-07-19 Thread Prof Brian Ripley

Uwe is now a member of R-core.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Welcome Uwe Ligges to R-core

2011-07-19 Thread Marc Schwartz

On Jul 19, 2011, at 7:29 AM, Prof Brian Ripley wrote:

> Uwe is now a member of R-core.

Congratulations Uwe!

Regards,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Manipulating single-precision (float) arrays in .Call functions

2011-07-19 Thread Simon Urbanek

On Jul 19, 2011, at 7:48 AM, Matthew Dowle wrote:

> 
> "Prof Brian Ripley"  wrote in message 
> news:alpine.lfd.2.02.1107190640280.28...@gannet.stats.ox.ac.uk...
>> On Mon, 18 Jul 2011, Alireza Mahani wrote:
>> 
>>> Simon,
>>> 
>>> Thank you for elaborating on the limitations of R in handling float 
>>> types. I
>>> think I'm pretty much there with you.
>>> 
>>> As for the insufficiency of single-precision math (and hence limitations 
>>> of
>>> GPU), my personal take so far has been that double-precision becomes 
>>> crucial
>>> when some sort of error accumulation occurs. For example, in differential
>>> equations where boundary values are integrated to arrive at interior 
>>> values,
>>> etc. On the other hand, in my personal line of work (Hierarchical 
>>> Bayesian
>>> models for quantitative marketing), we have so much inherent uncertainty 
>>> and
>>> noise at so many levels in the problem (and no significant error
>>> accumulation sources) that single vs double precision issue is often
>>> inconsequential for us. So I think it really depends on the field as well 
>>> as
>>> the nature of the problem.
>> 
>> The main reason to use only double precision in R was that on modern CPUs 
>> double precision calculations are as fast as single-precision ones, and 
>> with 64-bit CPUs they are a single access.
>> So the extra precision comes more-or-less for free.
> 
> But, isn't it much more of the 'less free' when large data sets are 
> considered? If a double matrix takes 3GB, it's 1.5GB in single.
> That might alleviate the dreaded out-of-memory error for some users in some 
> circumstances. On 64bit, 50GB reduces to 25GB

I'd like to see your 50Gb matrix in R ;) - you can't have a float matrix bigger 
than 8Gb, for doubles it is 16Gb so you don't gain anything in scalability. 
IMHO memory is not a strong case these days when hundreds GB of RAM are 
affordable...

Also you would not complain about pointers going from 4 to 8 bytes in 64-bit 
thus doubling your memory use for string vectors...

Cheers,
Simon


> and that might make the difference between getting
> something done, or not. If single were appropriate, of course.
> For GPU too, i/o often dominates iiuc.
> 
> For space reasons, is there any possibility of R supporting single
> precision (and single bit logical to reduce memory for logicals by
> 32 times)? I guess there might be complaints from users using
> single inappropriately (or worse, not realising we have an instable
> result due to single).
> 
> Matthew
> 
>> You also under-estimate the extent to which stability of commonly used 
>> algorithms relies on double precision.  (There are stable single-precision 
>> versions, but they are no longer commonly used.  And as Simon said, in 
>> some cases stability is ensured by using extra precision where available.)
>> 
>> I disagree slightly with Simon on GPUs: I am told by local experts that 
>> the double-precision on the latest GPUs (those from the last year or so) 
>> is perfectly usable.  See the performance claims on 
>> http://en.wikipedia.org/wiki/Nvidia_Tesla of about 50% of the SP 
>> performance in DP.
>> 
>>> 
>>> Regards,
>>> Alireza
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://r.789695.n4.nabble.com/Manipulating-single-precision-float-arrays-in-Call-functions-tp3675684p3677232.html
>>> Sent from the R devel mailing list archive at Nabble.com.
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> -- 
>> Brian D. Ripley,  rip...@stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel:  +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UKFax:  +44 1865 272595
>> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Manipulating single-precision (float) arrays in .Call functions

2011-07-19 Thread Duncan Murdoch

On 11-07-19 7:48 AM, Matthew Dowle wrote:


"Prof Brian Ripley"  wrote in message
news:alpine.lfd.2.02.1107190640280.28...@gannet.stats.ox.ac.uk...

On Mon, 18 Jul 2011, Alireza Mahani wrote:


Simon,

Thank you for elaborating on the limitations of R in handling float
types. I
think I'm pretty much there with you.

As for the insufficiency of single-precision math (and hence limitations
of
GPU), my personal take so far has been that double-precision becomes
crucial
when some sort of error accumulation occurs. For example, in differential
equations where boundary values are integrated to arrive at interior
values,
etc. On the other hand, in my personal line of work (Hierarchical
Bayesian
models for quantitative marketing), we have so much inherent uncertainty
and
noise at so many levels in the problem (and no significant error
accumulation sources) that single vs double precision issue is often
inconsequential for us. So I think it really depends on the field as well
as
the nature of the problem.


The main reason to use only double precision in R was that on modern CPUs
double precision calculations are as fast as single-precision ones, and
with 64-bit CPUs they are a single access.
So the extra precision comes more-or-less for free.


But, isn't it much more of the 'less free' when large data sets are
considered? If a double matrix takes 3GB, it's 1.5GB in single.
That might alleviate the dreaded out-of-memory error for some
users in some circumstances. On 64bit, 50GB reduces to 25GB
and that might make the difference between getting
something done, or not. If single were appropriate, of course.
For GPU too, i/o often dominates iiuc.

For space reasons, is there any possibility of R supporting single
precision (and single bit logical to reduce memory for logicals by
32 times)? I guess there might be complaints from users using
single inappropriately (or worse, not realising we have an instable
result due to single).


You can do any of this using external pointers now.  That will remind 
you that every single function to operate on such objects needs to be 
rewritten.


It's a huge amount of work, benefiting very few people.  I don't think 
anyone in R Core will do it.


Duncan Murdoch


Matthew


You also under-estimate the extent to which stability of commonly used
algorithms relies on double precision.  (There are stable single-precision
versions, but they are no longer commonly used.  And as Simon said, in
some cases stability is ensured by using extra precision where available.)

I disagree slightly with Simon on GPUs: I am told by local experts that
the double-precision on the latest GPUs (those from the last year or so)
is perfectly usable.  See the performance claims on
http://en.wikipedia.org/wiki/Nvidia_Tesla of about 50% of the SP
performance in DP.



Regards,
Alireza


--
View this message in context:
http://r.789695.n4.nabble.com/Manipulating-single-precision-float-arrays-in-Call-functions-tp3675684p3677232.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Manipulating single-precision (float) arrays in .Callfunctions

2011-07-19 Thread Matthew Dowle

"Duncan Murdoch"  wrote in message 
news:4e259600.5070...@gmail.com...
> On 11-07-19 7:48 AM, Matthew Dowle wrote:
>>
>> "Prof Brian Ripley"  wrote in message
>> news:alpine.lfd.2.02.1107190640280.28...@gannet.stats.ox.ac.uk...
>>> On Mon, 18 Jul 2011, Alireza Mahani wrote:
>>>
 Simon,

 Thank you for elaborating on the limitations of R in handling float
 types. I
 think I'm pretty much there with you.

 As for the insufficiency of single-precision math (and hence 
 limitations
 of
 GPU), my personal take so far has been that double-precision becomes
 crucial
 when some sort of error accumulation occurs. For example, in 
 differential
 equations where boundary values are integrated to arrive at interior
 values,
 etc. On the other hand, in my personal line of work (Hierarchical
 Bayesian
 models for quantitative marketing), we have so much inherent 
 uncertainty
 and
 noise at so many levels in the problem (and no significant error
 accumulation sources) that single vs double precision issue is often
 inconsequential for us. So I think it really depends on the field as 
 well
 as
 the nature of the problem.
>>>
>>> The main reason to use only double precision in R was that on modern 
>>> CPUs
>>> double precision calculations are as fast as single-precision ones, and
>>> with 64-bit CPUs they are a single access.
>>> So the extra precision comes more-or-less for free.
>>
>> But, isn't it much more of the 'less free' when large data sets are
>> considered? If a double matrix takes 3GB, it's 1.5GB in single.
>> That might alleviate the dreaded out-of-memory error for some
>> users in some circumstances. On 64bit, 50GB reduces to 25GB
>> and that might make the difference between getting
>> something done, or not. If single were appropriate, of course.
>> For GPU too, i/o often dominates iiuc.
>>
>> For space reasons, is there any possibility of R supporting single
>> precision (and single bit logical to reduce memory for logicals by
>> 32 times)? I guess there might be complaints from users using
>> single inappropriately (or worse, not realising we have an instable
>> result due to single).
>
> You can do any of this using external pointers now.  That will remind you 
> that every single function to operate on such objects needs to be 
> rewritten.
>
> It's a huge amount of work, benefiting very few people.  I don't think 
> anyone in R Core will do it.

Ok, thanks for the responses.
Matthew

>
> Duncan Murdoch
>
>> Matthew
>>
>>> You also under-estimate the extent to which stability of commonly used
>>> algorithms relies on double precision.  (There are stable 
>>> single-precision
>>> versions, but they are no longer commonly used.  And as Simon said, in
>>> some cases stability is ensured by using extra precision where 
>>> available.)
>>>
>>> I disagree slightly with Simon on GPUs: I am told by local experts that
>>> the double-precision on the latest GPUs (those from the last year or so)
>>> is perfectly usable.  See the performance claims on
>>> http://en.wikipedia.org/wiki/Nvidia_Tesla of about 50% of the SP
>>> performance in DP.
>>>

 Regards,
 Alireza


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Manipulating-single-precision-float-arrays-in-Call-functions-tp3675684p3677232.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

>>>
>>> --
>>> Brian D. Ripley,  rip...@stats.ox.ac.uk
>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford, Tel:  +44 1865 272861 (self)
>>> 1 South Parks Road, +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UKFax:  +44 1865 272595
>>>
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Performance of .C and .Call functions vs. native R code

2011-07-19 Thread Douglas Bates
On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani
 wrote:
> (I am using a LINUX machine)
>
> Jeff,
>
> In creating reproducible results, I 'partially' answered my question. I have
> attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
> files into your chosen directory, then run 'Rscript mvMultiply.r' in that
> directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
> 'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
> verify that the two methods produce the same output vector.)
>
> Below are the results that I get, along with discussion (tR and tCall are in
> sec):
>
> INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
> F,F,13.536,13.875
> F,T,13.824,14.299
> T,F,13.688,18.167
> T,T,13.982,30.730
>
> Interpretation: The execution time for the .Call line is nearly identical to
> the call to R operator '%*%'. Two data preparation lines for the .Call
> method create the overhead:
>
> A <- t(A) (~12sec, or 12usec per call)
> dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)
>
> While the first line can be avoided by providing options in c++ function (as
> is done in the BLAS API), I wonder if the second line can be avoided, aside
> from the obvious option of rewriting the R scripts to use vectors instead of
> matrices. But this defies one of the primary advantages of using R, which is
> succinct, high-level coding. In particular, if one has several matrices as
> input into a .Call function, then the overhead from matrix-to-vector
> transformations can add up. To summarize, my questions are:
>
> 1- Do the above results seem reasonable to you? Is there a similar penalty
> in R's '%*%' operator for transforming matrices to vectors before calling
> BLAS functions?
> 2- Are there techniques for reducing the above overhead for developers
> looking to augment their R code with compiled code?
>
> Regards,
> Alireza
>
> ---
> # mvMultiply.r
> # comparing performance of matrix multiplication in R (using '%*%' operator)
> vs. calling compiled code (using .Call function)
> # y [m x 1] = A [m x n] %*% x [n x 1]
>
> rm(list = ls())
> system("R CMD SHLIB mvMultiply.cc")
> dyn.load("mvMultiply.so")
>
> INCLUDE_DATAPREP <- F
> ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
> column-major fashion
>
> m <- 100
> n <- 10
> N <- 100
>
> diffVec <- array(0, dim = N)
> tR <- 0.0
> tCall <- 0.0
> for (i in 1:N) {
>
>        A <- runif(m*n); dim(A) <- c(m,n)
>        x <- runif(n)
>
>        t1 <- proc.time()[3]
>        y1 <- A %*% x
>        tR <- tR + proc.time()[3] - t1
>
>        if (INCLUDE_DATAPREP) {
>                t1 <- proc.time()[3]
>        }
>        if (ROWMAJOR) { #since R will convert matrix to vector in a 
> column-major
> fashion, if the c++ function expects a row-major format, we need to
> transpose A before converting it to a vector
>                A <- t(A)
>        }
>        dim(A) <- dim(A)[1] * dim(A)[2]
>        if (!INCLUDE_DATAPREP) {
>                t1 <- proc.time()[3]
>        }
>        y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
> as.logical(c(ROWMAJOR)))
>        tCall <- tCall + proc.time()[3] - t1
>
>        diffVec[i] <- max(abs(y2 - y1))
> }
> cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
> cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
> cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
> cat("Time - Using '.Call' function: ", tCall, "sec\n")
> cat("Maximum difference between methods: ", max(diffVec), "\n")
>
> dyn.unload("mvMultiply.so")
> ---
> # mvMultiply.cc
> #include 
> #include 
>
> extern "C" {
>
> SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
>        double *rA = REAL(A), *rx = REAL(x), *ry;
>        int *rrm = LOGICAL(rowmajor);
>        int n = length(x);
>        int m = length(A) / n;
>        SEXP y;
>        PROTECT(y = allocVector(REALSXP, m));
>        ry = REAL(y);
>        for (int i = 0; i < m; i++) {
>                ry[i] = 0.0;
>                for (int j = 0; j < n; j++) {
>                        if (rrm[0] == 1) {
>                                ry[i] += rA[i * n + j] * rx[j];
>                        } else {
>                                ry[i] += rA[j * m + i] * rx[j];
>                        }
>                }
>        }
>        UNPROTECT(1);
>        return(y);
> }
>
> }
>

I realize that you are just beginning to use the .C and .Call
interfaces and your example is therefore a simple one.  However, if
you plan to continue with such development it is worthwhile learning
of some of the tools available.  I think one of the most important is
the "inline" package that can take a C or C++ code segment as a text
string and go through all the steps of creating and loading a
.Call'able compiled function.

Second, if you are going to use C++ (the code you show could be C code
as it doesn't use any C++ extensions) then you should look at th

[Rd] Measuring and comparing .C and .Call overhead

2011-07-19 Thread Alireza Mahani
Further pursuing my curiosity to measure the efficiency of R/C++ interface, I
conducted a simple matrix-vector multiplication test using .C and .Call
functions in R. In each case, I measured the execution time in R, as well as
inside the C++ function. Subtracting the two, I came up with a measure of
overhead associated with each call. I assume that this overhead would be
non-existent of the entire code was implemented in C++. [Please let me know
if my approach is somehow faulty.] The results of this simple test have
significant implications for my software development approach. I hope others
find this information useful as well. I have attached the code files in the
bottom. Below I provide a summary output, and interpret the results. [All
times in table below are in microseconds]

m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead
100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93*
200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96*
500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75*

Interpretation:

1- .Call overhead holds nearly constant, i.e. independent of data size. This
is expected since .Call works by passing pointers. (How can we explain the
slight increase in overhead?)
2- C++ times for .C are somewhat better than .Call. This is likely to be due
to the overhead associated with unpacking the SEXP pointers in a .Call
function.
3- The overhead for .C dominates the execution time. For a 500x20 matrix,
the overhead is ~90% of total time. This means that whenever we need to make
repeated calls to a C/C++ function from R, and when performance is important
to us, .Call is much preferred to .C, even at modest data sizes.
4- Overhead for .C scales sub-linearly with data size. I imagine that this
overhead can be reduced through registering the functions and using the
style field in R_CMethodDef to optimize data transfer (per Section 5.4 of
"Writing R Extensions"), but perhaps not by more than a half.
5- Even using .Call, the overhead is still a significant percentage of total
time, though the contribution decreases with data size (and compute load of
function). However, in the context of parallelization benefits, this
overhead puts an unpleasant cap on achievable gains. For example, even if
the compute time goes down by 100x, if overhead was even 10% of the time,
our effective gain will saturate at 10x. In other words, effective
parallelization can only be achieved by moving big chunks of the code to
C/C++ so as to avoid repeated calls from R to C.

I look forward to your comments.

-Alireza

#code file: mvMultiply.r
#measuring execution time for a matrix-vector multiplication (A%*%x) using
.C and .Call functions
#execution time is measured both in R and inside the C++ functions;
subtracting the two is 
#used as an indicator of overhead associated with each call

rm(list = ls())
system("R CMD SHLIB mvMultiply.cc")
dyn.load("mvMultiply.so")

m <- 100 #number of rows in matrix A
n <- 10 #number of columns in matrix A (= number of rows in vector x)
N <- 100

A <- runif(m*n)
x <- runif(n)

# measuring .Call
tCall_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {
tCall_c <- tCall_c + .Call("matvecMultiply", as.double(A), as.double(x))
}
tCall_R <- proc.time()[3] - t1
cat(".Call - Time measured in R: ", round(tCall_R,2), "sec\n")
cat(".Call - Time measured in C++: ", round(tCall_c,2), "sec\n")
cat(".Call - Implied overhead: ", round(tCall_R,2) - round(tCall_c,2), "sec 
->  per call: ", 100*(round(tCall_R,2) - round(tCall_c,2))/N, "usec\n")

# measuring .C
tC_c <- 0.0
t1 <- proc.time()[3]
for (i in 1:N) {
tC_c <- tC_c + .C("matvecMultiply2", as.double(A), as.double(x),
double(c(m)), as.integer(c(m)), as.integer(c(n)), t = double(1))$t
}
tC_R <- proc.time()[3] - t1
cat(".C - Time measured in R: ", round(tC_R,2), "sec\n")
cat(".C - Time measured in C++: ", round(tC_c,2), "sec\n")
cat(".C - Implied overhead: ", round(tC_R,2) - round(tC_c,2), "sec  ->  per
call: ", 100*(round(tC_R,2) - round(tC_c,2))/N, "usec\n")

dyn.unload("mvMultiply.so")

#code file: myMultiply.cc
#include 
#include 
#include 

extern "C" {

SEXP matvecMultiply(SEXP A, SEXP x) {
timeval time1, time2;
gettimeofday(&time1, NULL);
SEXP execTime_sxp;
PROTECT(execTime_sxp = allocVector(REALSXP, 1));
double *execTime; execTime = REAL(execTime_sxp);
double *rA = REAL(A), *rx = REAL(x), *ry;
int n = length(x);
int m = length(A) / n;
SEXP y;
PROTECT(y = allocVector(REALSXP, m));
ry = REAL(y);
for (int i = 0; i < m; i++) {
ry[i] = 0.0;
for (int j = 0; j < n; j++) {
ry[i] += rA[j * m + i] * rx[j];
}
}
UNPROTECT(1);
gettimeofday(&time2, NULL);
*execTime = (time2.tv_sec - time1.tv_sec) + (time2.tv_usec -
time1.tv_usec)/100.0;
UNPROTECT(1);
r

Re: [Rd] Performance of .C and .Call functions vs. native R code

2011-07-19 Thread Douglas Bates
I just saw that I left a syntax error in the .R and the first
_Rout.txt files.  Notice that in the second _Rout.txt file the order
of the arguments in the constructors for the MMatrixXd and the
MVectorXd are in a different order than in the .R and the first
_Rout.txt files.  The correct order has the pointer first, then the
dimensions.  For the first _Rout.txt file this part of the code is not
used.

On Tue, Jul 19, 2011 at 10:00 AM, Douglas Bates  wrote:
> On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani
>  wrote:
>> (I am using a LINUX machine)
>>
>> Jeff,
>>
>> In creating reproducible results, I 'partially' answered my question. I have
>> attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
>> files into your chosen directory, then run 'Rscript mvMultiply.r' in that
>> directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
>> 'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
>> verify that the two methods produce the same output vector.)
>>
>> Below are the results that I get, along with discussion (tR and tCall are in
>> sec):
>>
>> INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
>> F,F,13.536,13.875
>> F,T,13.824,14.299
>> T,F,13.688,18.167
>> T,T,13.982,30.730
>>
>> Interpretation: The execution time for the .Call line is nearly identical to
>> the call to R operator '%*%'. Two data preparation lines for the .Call
>> method create the overhead:
>>
>> A <- t(A) (~12sec, or 12usec per call)
>> dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)
>>
>> While the first line can be avoided by providing options in c++ function (as
>> is done in the BLAS API), I wonder if the second line can be avoided, aside
>> from the obvious option of rewriting the R scripts to use vectors instead of
>> matrices. But this defies one of the primary advantages of using R, which is
>> succinct, high-level coding. In particular, if one has several matrices as
>> input into a .Call function, then the overhead from matrix-to-vector
>> transformations can add up. To summarize, my questions are:
>>
>> 1- Do the above results seem reasonable to you? Is there a similar penalty
>> in R's '%*%' operator for transforming matrices to vectors before calling
>> BLAS functions?
>> 2- Are there techniques for reducing the above overhead for developers
>> looking to augment their R code with compiled code?
>>
>> Regards,
>> Alireza
>>
>> ---
>> # mvMultiply.r
>> # comparing performance of matrix multiplication in R (using '%*%' operator)
>> vs. calling compiled code (using .Call function)
>> # y [m x 1] = A [m x n] %*% x [n x 1]
>>
>> rm(list = ls())
>> system("R CMD SHLIB mvMultiply.cc")
>> dyn.load("mvMultiply.so")
>>
>> INCLUDE_DATAPREP <- F
>> ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
>> column-major fashion
>>
>> m <- 100
>> n <- 10
>> N <- 100
>>
>> diffVec <- array(0, dim = N)
>> tR <- 0.0
>> tCall <- 0.0
>> for (i in 1:N) {
>>
>>        A <- runif(m*n); dim(A) <- c(m,n)
>>        x <- runif(n)
>>
>>        t1 <- proc.time()[3]
>>        y1 <- A %*% x
>>        tR <- tR + proc.time()[3] - t1
>>
>>        if (INCLUDE_DATAPREP) {
>>                t1 <- proc.time()[3]
>>        }
>>        if (ROWMAJOR) { #since R will convert matrix to vector in a 
>> column-major
>> fashion, if the c++ function expects a row-major format, we need to
>> transpose A before converting it to a vector
>>                A <- t(A)
>>        }
>>        dim(A) <- dim(A)[1] * dim(A)[2]
>>        if (!INCLUDE_DATAPREP) {
>>                t1 <- proc.time()[3]
>>        }
>>        y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
>> as.logical(c(ROWMAJOR)))
>>        tCall <- tCall + proc.time()[3] - t1
>>
>>        diffVec[i] <- max(abs(y2 - y1))
>> }
>> cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
>> cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
>> cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
>> cat("Time - Using '.Call' function: ", tCall, "sec\n")
>> cat("Maximum difference between methods: ", max(diffVec), "\n")
>>
>> dyn.unload("mvMultiply.so")
>> ---
>> # mvMultiply.cc
>> #include 
>> #include 
>>
>> extern "C" {
>>
>> SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
>>        double *rA = REAL(A), *rx = REAL(x), *ry;
>>        int *rrm = LOGICAL(rowmajor);
>>        int n = length(x);
>>        int m = length(A) / n;
>>        SEXP y;
>>        PROTECT(y = allocVector(REALSXP, m));
>>        ry = REAL(y);
>>        for (int i = 0; i < m; i++) {
>>                ry[i] = 0.0;
>>                for (int j = 0; j < n; j++) {
>>                        if (rrm[0] == 1) {
>>                                ry[i] += rA[i * n + j] * rx[j];
>>                        } else {
>>                                ry[i] += rA[j * m + i] * rx[j];
>>                        }
>>                }
>>        }
>>        UNPROTECT(1);
>>       

[Rd] Tesla GPUs [Was: Manipulating single-precision (float) arrays in .Call functions]

2011-07-19 Thread Simon Urbanek

On Jul 19, 2011, at 2:26 AM, Prof Brian Ripley wrote:

> On Mon, 18 Jul 2011, Alireza Mahani wrote:
> 
>> Simon,
>> 
>> Thank you for elaborating on the limitations of R in handling float types. I
>> think I'm pretty much there with you.
>> 
>> As for the insufficiency of single-precision math (and hence limitations of
>> GPU), my personal take so far has been that double-precision becomes crucial
>> when some sort of error accumulation occurs. For example, in differential
>> equations where boundary values are integrated to arrive at interior values,
>> etc. On the other hand, in my personal line of work (Hierarchical Bayesian
>> models for quantitative marketing), we have so much inherent uncertainty and
>> noise at so many levels in the problem (and no significant error
>> accumulation sources) that single vs double precision issue is often
>> inconsequential for us. So I think it really depends on the field as well as
>> the nature of the problem.
> 
> The main reason to use only double precision in R was that on modern CPUs 
> double precision calculations are as fast as single-precision ones, and with 
> 64-bit CPUs they are a single access.  So the extra precision comes 
> more-or-less for free.  You also under-estimate the extent to which stability 
> of commonly used algorithms relies on double precision.  (There are stable 
> single-precision versions, but they are no longer commonly used.  And as 
> Simon said, in some cases stability is ensured by using extra precision where 
> available.)
> 
> I disagree slightly with Simon on GPUs: I am told by local experts that the 
> double-precision on the latest GPUs (those from the last year or so) is 
> perfectly usable.  See the performance claims on 
> http://en.wikipedia.org/wiki/Nvidia_Tesla of about 50% of the SP performance 
> in DP.
> 

That would be good news. Unfortunately those seem to be still targeted at a 
specialized market and are not really graphics cards in traditional sense. 
Although this is sort of required for the purpose it removes the benefit of 
ubiquity. So, yes, I agree with you that it may be an interesting way forward, 
but I fear it's too much of a niche to be widely supported. I may want to ask 
our GPU specialists here to see if they have any around so I could re-visit our 
OpenCL R benchmarks. Last time we abandoned our OpenCL R plans exactly due to 
the lack of speed in double precision.

Thanks,
Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Measuring and comparing .C and .Call overhead

2011-07-19 Thread Simon Urbanek

On Jul 19, 2011, at 11:07 AM, Alireza Mahani wrote:

> Further pursuing my curiosity to measure the efficiency of R/C++ interface, I
> conducted a simple matrix-vector multiplication test using .C and .Call
> functions in R. In each case, I measured the execution time in R, as well as
> inside the C++ function. Subtracting the two, I came up with a measure of
> overhead associated with each call. I assume that this overhead would be
> non-existent of the entire code was implemented in C++. [Please let me know
> if my approach is somehow faulty.] The results of this simple test have
> significant implications for my software development approach. I hope others
> find this information useful as well. I have attached the code files in the
> bottom. Below I provide a summary output, and interpret the results. [All
> times in table below are in microseconds]
> 
> m/n/time_.Call_R/time_.Call_C++/.Call_overhead/time_.C_R/time_.C_C++/.C_overhead
> 100/10/3.23/1.33/*1.90*/16.89/0.96/*15.93*
> 200/15/5.38/3.44/*1.94*/30.77/2.81/*27.96*
> 500/20/12.48/10.4/*2.08*/78.12/9.37/*68.75*
> 

Your "overhead" calculations are anything but since they include a loop, value 
replacement, arithmetics, unnecessary function calls (you don't need 
as.double() for .Call - it's more efficient to use coerceVector() or throw an 
error depending on what you desire) - that have nothing to do with .Call.

In addition, if you really cared about overhead, you would avoid symbol lookup 
millions of times:

foo.c:
#include 
SEXP foo() { return R_NilValue; }

> system.time(for(i in 1:1e6) .Call("foo"))
   user  system elapsed 
  4.943   0.016   4.960 
> foo = getNativeSymbolInfo("foo")$address
> system.time(for(i in 1:1e6) .Call(foo))
   user  system elapsed 
  0.363   0.001   0.365 

This is very close to just calling a constant closure with the same result:

> f=function() NULL
> system.time(for(i in 1:1e6) f())
   user  system elapsed 
  0.238   0.001   0.239 

So .Call overhead is essentially negligible - and not what you were measuring 
;).

There is only a tiny overhead in forcing the argument promise, but it's 
constant:

> x=rnorm(1e6)
> system.time(for(i in 1:1e6) .Call(foo,x))
   user  system elapsed 
  0.455   0.001   0.456 
> x=rnorm(1e9)
> system.time(for(i in 1:1e6) .Call(foo,x))
   user  system elapsed 
  0.454   0.000   0.455 

and it's again no different that for a closure:
> f=function(x) NULL
> system.time(for(i in 1:1e6) f())
   user  system elapsed 
  0.259   0.001   0.259 
> system.time(for(i in 1:1e6) f(x))
   user  system elapsed 
  0.329   0.001   0.329 

so this is inherent to any call. .Call is the fastest you can get, there is 
nothing faster (other than hacking your code into R internals...).

Cheers,
Simon


> Interpretation:
> 
> 1- .Call overhead holds nearly constant, i.e. independent of data size. This
> is expected since .Call works by passing pointers. (How can we explain the
> slight increase in overhead?)
> 2- C++ times for .C are somewhat better than .Call. This is likely to be due
> to the overhead associated with unpacking the SEXP pointers in a .Call
> function.
> 3- The overhead for .C dominates the execution time. For a 500x20 matrix,
> the overhead is ~90% of total time. This means that whenever we need to make
> repeated calls to a C/C++ function from R, and when performance is important
> to us, .Call is much preferred to .C, even at modest data sizes.
> 4- Overhead for .C scales sub-linearly with data size. I imagine that this
> overhead can be reduced through registering the functions and using the
> style field in R_CMethodDef to optimize data transfer (per Section 5.4 of
> "Writing R Extensions"), but perhaps not by more than a half.
> 5- Even using .Call, the overhead is still a significant percentage of total
> time, though the contribution decreases with data size (and compute load of
> function). However, in the context of parallelization benefits, this
> overhead puts an unpleasant cap on achievable gains. For example, even if
> the compute time goes down by 100x, if overhead was even 10% of the time,
> our effective gain will saturate at 10x. In other words, effective
> parallelization can only be achieved by moving big chunks of the code to
> C/C++ so as to avoid repeated calls from R to C.
> 
> I look forward to your comments.
> 
> -Alireza
> 
> #code file: mvMultiply.r
> #measuring execution time for a matrix-vector multiplication (A%*%x) using
> .C and .Call functions
> #execution time is measured both in R and inside the C++ functions;
> subtracting the two is 
> #used as an indicator of overhead associated with each call
> 
> rm(list = ls())
> system("R CMD SHLIB mvMultiply.cc")
> dyn.load("mvMultiply.so")
> 
> m <- 100 #number of rows in matrix A
> n <- 10 #number of columns in matrix A (= number of rows in vector x)
> N <- 100
> 
> A <- runif(m*n)
> x <- runif(n)
> 
> # measuring .Call
> tCall_c <- 0.0
> t1 <- proc.time()[3]
> for (i in 1:N) {

Re: [Rd] Performance of .C and .Call functions vs. native R code

2011-07-19 Thread Alireza Mahani
Prof. Bates,

It looks like you read my mind! I am working on writing an R package for
high-performance MCMC estimation of a class of Hierarchical Bayesian models
most often used in the field of quantitative marketing. This would
essentially be a parallelized version of Peter Rossi's bayesm package. While
I've made great progress in parallelizing the most mathematically difficult
part of the algorithm, namely slice sampling of low-level coefficients, yet
I've realized that putting the entire code together while minimizing bugs is
a big challenge in C/C++/CUDA environments. I have therefore decided to
follow a more logical path of first developing the code logic in R, and then
exporting it function by function to compiled code. 

The tools that you mentioned seem to be exactly the kind of stuff I need in
order to be able to do go through this incremental, test-oriented
development process with relatively little pain.

I'm not sure if this is what you had in mind while suggesting the tools to
me, so please let me know if I'm misinterpreting your comments, or if I need
to be aware of other tools beyond what you mentioned.

Many thanks,
Alireza


--
View this message in context: 
http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3679056.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Confusing inheritance problem

2011-07-19 Thread Hervé Pagès

Hi Terry,

You use a NAMESPACE but you don't import Matrix.
So it looks like the "rowSums" method for CsparseMatrix objects
cannot be found (not sure why because you do have Matrix in the
Depends field so the rowSums generic and methods should be in the
search path).

Anyway, just adding

import(Matrix)

to the NAMESPACE file solves the problem for me.

Cheers,
H.

On 11-07-18 09:16 AM, Terry Therneau wrote:

  I've packaged the test library up as a tar file at
ftp.mayo.edu
directory therneau, file ktest.tar
login username: mayoftp
  password:  KPlLiFoz
This will disappear in 3 days (Mayo is very fussy about outside access).

In response to Uwe's comments
   1. "2.13.0" is not recent
 It's not the latest, but it is recent.  This is for machines at work where
where upgrades happen more infrequently
   2. "Matrix not loaded"
The sessionInfo was only to show what version we have.  Forgetting to load 
Matrix
isn't the problem -- when I do that the error is quick and obvious.

  Thanks in advance for any pointers.

Terry T.



On Sat, 2011-07-16 at 19:27 +0200, Uwe Ligges wrote:


On 15.07.2011 23:23, Terry Therneau wrote:

   I have library in development with a function that works when called
from the top level, but fails under R CMD check.  The paricular line of
failure is
rsum<- rowSums(kmat>0)
where kmat is a dsCMatrix object.

I'm currently stumped and looking for some ideas.

I've created a stripped down library "ktest" that has only 3
functions: pedigree.R to create a pedigree or pedigreeList object,
   kinship.R with "kinship" methods for the two objects
   one small compute function called by the others
along with the minimal amount of other information such that a call to
 R --vanilla CMD check ktest
gives no errors until the fatal one.

   There are two test cases.  A 3 line one that creates a dsCMatrix and
call rowSums at the top level works fine, but the same call inside the
kmat.pedigreeList function gives an error
  'x' must be an array of at least two dimensions
Adding a print statement above the rowSums call shows that the argument
is a 14 by 14 dsCMatrix.

   I'm happy to send the library to anyone else to try and duplicate.
  Terry Therneau

tmt% R --vanilla


sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
   [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base




Terry,

1. Your R is not recent.
2. You do this without having Matrix loaded (according to
sessionInfo())? This may already be the cause of your problems.
3. You may want to make your package available on some website. I am
sure there are people who will take a look (including me, but not today).

Best wishes,
Uwe










__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Confusing inheritance problem

2011-07-19 Thread Therneau, Terry M., Ph.D.

 Yup, that fixes the problem.  This was also pointed out by Uwe in a personal
email.
  One confusing aspect is that there are two test files in the skeleton 
package, one of 
which calls rowSums( a dsCMatrix object) at top level, and the other of which 
does the
same thing within a dispached method.  Only the second one fails.  An 
interesting puzzle, but not one I'll pursue with any vigor now that we have a 
fix.

Terry T.

-Original Message-
From: Hervé Pagès [mailto:hpa...@fhcrc.org] 
Sent: Tuesday, July 19, 2011 2:54 PM
To: Therneau, Terry M., Ph.D.
Cc: r-devel@r-project.org
Subject: Re: [Rd] Confusing inheritance problem

Hi Terry,

You use a NAMESPACE but you don't import Matrix.
So it looks like the "rowSums" method for CsparseMatrix objects
cannot be found (not sure why because you do have Matrix in the
Depends field so the rowSums generic and methods should be in the
search path).

Anyway, just adding

import(Matrix)

to the NAMESPACE file solves the problem for me.

Cheers,
H.

On 11-07-18 09:16 AM, Terry Therneau wrote:
>   I've packaged the test library up as a tar file at
>   ftp.mayo.edu
>   directory therneau, file ktest.tar
>   login username: mayoftp
>   password:  KPlLiFoz
> This will disappear in 3 days (Mayo is very fussy about outside access).
>
> In response to Uwe's comments
>1. "2.13.0" is not recent
>  It's not the latest, but it is recent.  This is for machines at work 
> where
> where upgrades happen more infrequently
>2. "Matrix not loaded"
> The sessionInfo was only to show what version we have.  Forgetting to load 
> Matrix
> isn't the problem -- when I do that the error is quick and obvious.
>
>   Thanks in advance for any pointers.
>
> Terry T.
>
>
>
> On Sat, 2011-07-16 at 19:27 +0200, Uwe Ligges wrote:
>>
>> On 15.07.2011 23:23, Terry Therneau wrote:
>>>I have library in development with a function that works when called
>>> from the top level, but fails under R CMD check.  The paricular line of
>>> failure is
>>> rsum<- rowSums(kmat>0)
>>> where kmat is a dsCMatrix object.
>>>
>>> I'm currently stumped and looking for some ideas.
>>>
>>> I've created a stripped down library "ktest" that has only 3
>>> functions: pedigree.R to create a pedigree or pedigreeList object,
>>>kinship.R with "kinship" methods for the two objects
>>>one small compute function called by the others
>>> along with the minimal amount of other information such that a call to
>>>  R --vanilla CMD check ktest
>>> gives no errors until the fatal one.
>>>
>>>There are two test cases.  A 3 line one that creates a dsCMatrix and
>>> call rowSums at the top level works fine, but the same call inside the
>>> kmat.pedigreeList function gives an error
>>>   'x' must be an array of at least two dimensions
>>> Adding a print statement above the rowSums call shows that the argument
>>> is a 14 by 14 dsCMatrix.
>>>
>>>I'm happy to send the library to anyone else to try and duplicate.
>>>   Terry Therneau
>>>
>>> tmt% R --vanilla
>>>
 sessionInfo()
>>> R version 2.13.0 (2011-04-13)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>>[3] LC_TIME=en_US.UTF-8LC_COLLATE=C
>>>[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
>>>[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>>[9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods
>>> base
>>
>>
>>
>> Terry,
>>
>> 1. Your R is not recent.
>> 2. You do this without having Matrix loaded (according to
>> sessionInfo())? This may already be the cause of your problems.
>> 3. You may want to make your package available on some website. I am
>> sure there are people who will take a look (including me, but not today).
>>
>> Best wishes,
>> Uwe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] hanging spaces prior to linebreak from cat()

2011-07-19 Thread Tim Triche, Jr.
(re-sending after confirming list subscription; apologies if this ends up
being sent to the list twice)

Is the expected behavior from cat(), as used below, a hanging space before
\n at the end of the emitted line?

 firstheader = gsub("\\s+$", "", paste(c("Hybridization REF", s, s),
collapse = "\t"))
 cat(firstheader, "\n", file = filename)

When I run the above code, which is followed by appending the contents of a
data.frame via write.table(), what I get is a file whose first line ends in
a " \n" (note the space preceding the newline).  I would think that the
gsub() would have trimmed off this hanging space, but it doesn't.  Is this a
strange bug or expected behavior?  I have attached the version of cat.R in
from the SVN checkout I am running as my R build; it doesn't seem awry...
?!?

> sessionInfo()
R version 2.14.0 Under development (unstable) (2011-04-21 r55577)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

other attached packages:
 [1] EGC.tools_1.0
IlluminaHumanMethylation27k.db_1.4.6
 [3] IlluminaHumanMethylation450k.db_1.4.6 org.Hs.eg.db_2.5.0
 [5] RSQLite_0.9-4 DBI_0.2-5
 [7] AnnotationDbi_1.15.7  Biobase_2.13.2
 [9] gtools_2.6.2  reshape_0.8.4
[11] plyr_1.4

loaded via a namespace (and not attached):
[1] annotate_1.31.0   grid_2.14.0   lattice_0.19-17   matrixStats_0.2.2
[5] methylumi_1.9.5   xtable_1.5-6


-- 
If people do not believe that mathematics is simple, it is only because they
do not realize how complicated life is.
John von 
Neumann
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improved Nelder-Mead algorithm - a potential replacement for optim's Nelder-Mead

2011-07-19 Thread Kellner
Hi,

i have a question concerning the Nelder-Mead algorithm in R. As far as i can
see, the shrink operation is not included in the optim() function. Does
anyone know an implementation of the Nelder-Mead algorithm including this
operation in R? Could maybe someone send me one? I would try to write the
code on my own, but due to my doctor thesis i am a little under time
pressure and would appreciate any help. Thank you in advance!

Greetings
Ralf

--
View this message in context: 
http://r.789695.n4.nabble.com/Improved-Nelder-Mead-algorithm-a-potential-replacement-for-optim-s-Nelder-Mead-tp1580101p3678732.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Randomness not due to seed

2011-07-19 Thread jeroen00ms
I am working on a reproducible computing platform for which I would like to
be able to _exactly_ reproduce an R object. However, I am experiencing
unexpected randomness in some calculations. I have a hard time finding out
exactly how it occurs. The code below illustrates the issue. 

mylm1 <- lm(dist~speed, data=cars);
mylm2 <- lm(dist~speed, data=cars);
identical(mylm1, mylm2); #TRUE

makelm <- function(){
return(lm(dist~speed, data=cars));
}

mylm1 <- makelm();
mylm2 <- makelm();
identical(mylm1, mylm2); #FALSE

When inspecting both objects there seem to be some rounding differences.
Setting a seed does not make a difference. Is there any way I can remove
this randomness and exactly reproduce the object every time?





--
View this message in context: 
http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Randomness not due to seed

2011-07-19 Thread William Dunlap
Did you actually see some rounding differences?

The lm objects made in the calls to maklm will
differ in the environments attached to the formula
(because you made the formula in the function).  If
I change both copies of that .Environment attribute
to .GlobalEnv (or any other environment), then identical
reports the objects are the same:

  > attr(attr(mylm1$model, "terms"), ".Environment") <- .GlobalEnv
  > attr(mylm1$terms, ".Environment") <- .GlobalEnv
  > attr(attr(mylm2$model, "terms"), ".Environment") <- .GlobalEnv
  > attr(mylm2$terms, ".Environment") <- .GlobalEnv
  > identical(mylm1, mylm2)
  [1] TRUE 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
> Behalf Of jeroen00ms
> Sent: Tuesday, July 19, 2011 6:13 AM
> To: r-devel@r-project.org
> Subject: [Rd] Randomness not due to seed
> 
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
> 
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
> 
> makelm <- function(){
>   return(lm(dist~speed, data=cars));
> }
> 
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
> 
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-
> tp3678082p3678082.html
> Sent from the R devel mailing list archive at Nabble.com.
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improved Nelder-Mead algorithm - a potential replacement for optim's Nelder-Mead

2011-07-19 Thread Ravi Varadhan
Take a look at the `nmk' function in"dfoptim" package.

Ravi.

From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on behalf 
of Kellner [ralf.kell...@wiso.uni-erlangen.de]
Sent: Tuesday, July 19, 2011 1:33 PM
To: r-devel@r-project.org
Subject: Re: [Rd] Improved Nelder-Mead algorithm - a potential replacement for 
optim's Nelder-Mead

Hi,

i have a question concerning the Nelder-Mead algorithm in R. As far as i can
see, the shrink operation is not included in the optim() function. Does
anyone know an implementation of the Nelder-Mead algorithm including this
operation in R? Could maybe someone send me one? I would try to write the
code on my own, but due to my doctor thesis i am a little under time
pressure and would appreciate any help. Thank you in advance!

Greetings
Ralf

--
View this message in context: 
http://r.789695.n4.nabble.com/Improved-Nelder-Mead-algorithm-a-potential-replacement-for-optim-s-Nelder-Mead-tp1580101p3678732.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hanging spaces prior to linebreak from cat()

2011-07-19 Thread Simon Urbanek

On Jul 19, 2011, at 10:44 AM, Tim Triche, Jr. wrote:

> (re-sending after confirming list subscription; apologies if this ends up
> being sent to the list twice)
> 
> Is the expected behavior from cat(), as used below, a hanging space before
> \n at the end of the emitted line?
> 
> firstheader = gsub("\\s+$", "", paste(c("Hybridization REF", s, s),
> collapse = "\t"))
> cat(firstheader, "\n", file = filename)
> 
> When I run the above code, which is followed by appending the contents of a
> data.frame via write.table(), what I get is a file whose first line ends in
> a " \n" (note the space preceding the newline).

Yes, you use the default sep=' ' (space) so, obviously there will be a space 
before the newline as requested. If you don't want that then you probably want 
to use sep=''.

Cheers,
Simon


>  I would think that the
> gsub() would have trimmed off this hanging space, but it doesn't.  Is this a
> strange bug or expected behavior?  I have attached the version of cat.R in
> from the SVN checkout I am running as my R build; it doesn't seem awry...
> ?!?
> 
>> sessionInfo()
> R version 2.14.0 Under development (unstable) (2011-04-21 r55577)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C  LC_MESSAGES=C
> [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices datasets  utils methods   base
> 
> other attached packages:
> [1] EGC.tools_1.0
> IlluminaHumanMethylation27k.db_1.4.6
> [3] IlluminaHumanMethylation450k.db_1.4.6 org.Hs.eg.db_2.5.0
> [5] RSQLite_0.9-4 DBI_0.2-5
> [7] AnnotationDbi_1.15.7  Biobase_2.13.2
> [9] gtools_2.6.2  reshape_0.8.4
> [11] plyr_1.4
> 
> loaded via a namespace (and not attached):
> [1] annotate_1.31.0   grid_2.14.0   lattice_0.19-17   matrixStats_0.2.2
> [5] methylumi_1.9.5   xtable_1.5-6
> 
> 
> -- 
> If people do not believe that mathematics is simple, it is only because they
> do not realize how complicated life is.
> John von 
> Neumann
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel