[Rd] An example of very slow computation

2011-08-17 Thread John C Nash
This message is about a curious difference in timing between two ways of 
computing the
same function. One uses expm, so is expected to be a bit slower, but "a bit" 
turned out to
be a factor of >1000. The code is below. We would be grateful if anyone can 
point out any
egregious bad practice in our code, or enlighten us on why one approach is so 
much slower
than the other. The problem arose in an activity to develop guidelines for 
nonlinear
modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), and 
we will be
trying to include suggestions of how to prepare problems like this for 
efficient and
effective solution. The code for nlogL was the "original" from the worker who 
supplied the
problem.

Best,

John Nash

--

cat("mineral-timing.R == benchmark MIN functions in R\n")
#  J C Nash July 31, 2011

require("microbenchmark")
require("numDeriv")
library(Matrix)
library(optimx)
# dat<-read.table('min.dat', skip=3, header=FALSE)
# t<-dat[,1]
t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)

# y<-dat[,2] # ?? tidy up
 y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
12.699,
12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 14.351,
14.458, 14.756, 15.262, 15.703, 15.703, 15.703)


ones<-rep(1,length(t))
theta<-c(-2,-2,-2,-2)


nlogL<-function(theta){
  k<-exp(theta[1:3])
  sigma<-exp(theta[4])
  A<-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0<-c(0,100)
  sol<-function(t)100-sum(expm(A*t)%*%x0)
  pred<-sapply(dat[,1],sol)
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
}

getpred<-function(theta, t){
  k<-exp(theta[1:3])
  sigma<-exp(theta[4])
  A<-rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
  x0<-c(0,100)
  sol<-function(tt)100-sum(expm(A*tt)%*%x0)
  pred<-sapply(t,sol)
}

Mpred <- function(theta) {
# WARNING: assumes t global
kvec<-exp(theta[1:3])
k1<-kvec[1]
k2<-kvec[2]
k3<-kvec[3]
#   MIN problem terbuthylazene disappearance
z<-k1+k2+k3
y<-z*z-4*k1*k3
l1<-0.5*(-z+sqrt(y))
l2<-0.5*(-z-sqrt(y))
val<-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
} # val should be a vector if t is a vector

negll <- function(theta){
# non expm version JN 110731
  pred<-Mpred(theta)
  sigma<-exp(theta[4])
  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
}

theta<-rep(-2,4)
fand<-nlogL(theta)
fsim<-negll(theta)
cat("Check fn vals: expm =",fand,"   simple=",fsim,"  diff=",fand-fsim,"\n")

cat("time the function in expm form\n")
tnlogL<-microbenchmark(nlogL(theta), times=100L)
tnlogL

cat("time the function in simpler form\n")
tnegll<-microbenchmark(negll(theta), times=100L)
tnegll

ftimes<-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
# ftimes


boxplot(log(ftimes))
title("Log times in nanoseconds for matrix exponential and simple MIN fn")

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux

Hi,

in R 2.12.1, R CMD check hangs when building a vignette that uses a 
foreach loop with the doMC parallel backend.

This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
All versions of multicore, doMC and foreach are the same on both my R 
installations.


Has anybody encountered a similar issue?

Thank you.
Renaud




###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann
I think one difference is that negll() is fully vectorized - no loops, whereas
nlogL calls the function sol() inside sapply, i.e. a loop.

Michael


On 17 Aug 2011, at 10:27AM, John C Nash wrote:

> This message is about a curious difference in timing between two ways of 
> computing the
> same function. One uses expm, so is expected to be a bit slower, but "a bit" 
> turned out to
> be a factor of >1000. The code is below. We would be grateful if anyone can 
> point out any
> egregious bad practice in our code, or enlighten us on why one approach is so 
> much slower
> than the other. The problem arose in an activity to develop guidelines for 
> nonlinear
> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
> and we will be
> trying to include suggestions of how to prepare problems like this for 
> efficient and
> effective solution. The code for nlogL was the "original" from the worker who 
> supplied the
> problem.
> 
> Best,
> 
> John Nash
> 
> --
> 
> cat("mineral-timing.R == benchmark MIN functions in R\n")
> #  J C Nash July 31, 2011
> 
> require("microbenchmark")
> require("numDeriv")
> library(Matrix)
> library(optimx)
> # dat<-read.table('min.dat', skip=3, header=FALSE)
> # t<-dat[,1]
> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
> 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
> 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
> 
> # y<-dat[,2] # ?? tidy up
> y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
> 12.699,
> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
> 14.351,
> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
> 
> 
> ones<-rep(1,length(t))
> theta<-c(-2,-2,-2,-2)
> 
> 
> nlogL<-function(theta){
>  k<-exp(theta[1:3])
>  sigma<-exp(theta[4])
>  A<-rbind(
>  c(-k[1], k[2]),
>  c( k[1], -(k[2]+k[3]))
>  )
>  x0<-c(0,100)
>  sol<-function(t)100-sum(expm(A*t)%*%x0)
>  pred<-sapply(dat[,1],sol)
>  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
> 
> getpred<-function(theta, t){
>  k<-exp(theta[1:3])
>  sigma<-exp(theta[4])
>  A<-rbind(
>  c(-k[1], k[2]),
>  c( k[1], -(k[2]+k[3]))
>  )
>  x0<-c(0,100)
>  sol<-function(tt)100-sum(expm(A*tt)%*%x0)
>  pred<-sapply(t,sol)
> }
> 
> Mpred <- function(theta) {
> # WARNING: assumes t global
> kvec<-exp(theta[1:3])
> k1<-kvec[1]
> k2<-kvec[2]
> k3<-kvec[3]
> #   MIN problem terbuthylazene disappearance
>z<-k1+k2+k3
>y<-z*z-4*k1*k3
>l1<-0.5*(-z+sqrt(y))
>l2<-0.5*(-z-sqrt(y))
>val<-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
> } # val should be a vector if t is a vector
> 
> negll <- function(theta){
> # non expm version JN 110731
>  pred<-Mpred(theta)
>  sigma<-exp(theta[4])
>  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
> 
> theta<-rep(-2,4)
> fand<-nlogL(theta)
> fsim<-negll(theta)
> cat("Check fn vals: expm =",fand,"   simple=",fsim,"  diff=",fand-fsim,"\n")
> 
> cat("time the function in expm form\n")
> tnlogL<-microbenchmark(nlogL(theta), times=100L)
> tnlogL
> 
> cat("time the function in simpler form\n")
> tnegll<-microbenchmark(negll(theta), times=100L)
> tnegll
> 
> ftimes<-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
> # ftimes
> 
> 
> boxplot(log(ftimes))
> title("Log times in nanoseconds for matrix exponential and simple MIN fn")
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Jonathan Malmaud
Hi,
My R package has files in the 'inst' directory that it needs to reference. How 
can the R scripts in my package find out the full path to the 'inst' directory 
after the package is installed, given that different users may have installed 
the package to different libraries? 

Thanks,
Jon Malmaud
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux
I did notice some strange behaviour once, but things were working 
without a problem for a while now.
foreach loops and concepts are nice concepts, which I'd like to carry on 
using.
Maybe somebody from the foreach-dopar packages could explain why they 
have such issues?

Tim do you have concrete examples of the issues you talked with other 
developers?

Thank you.

Renaud

On 17/08/2011 12:45, Tim Triche, Jr. wrote:
> yes -- doMC (and doSMP) are kind of bogus and I just use 
> multicore::mclapply() to good effect these days.  I checked around 
> with other people doing similar things at the Bioconductor developer 
> day and pretty much everyone confirmed that doWHATEVER seems to have 
> issues, where as multicore itself... doesn't.
>
> Just my $0.02,
>
> --t
>
>
> On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux 
> mailto:ren...@mancala.cbio.uct.ac.za>> 
> wrote:
>
> Hi,
>
> in R 2.12.1, R CMD check hangs when building a vignette that uses
> a foreach loop with the doMC parallel backend.
> This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
> All versions of multicore, doMC and foreach are the same on both
> my R installations.
>
> Has anybody encountered a similar issue?
>
> Thank you.
> Renaud
>
>
>
>
> ###
> UNIVERSITY OF CAPE TOWN
> This e-mail is subject to the UCT ICT policies and
> e-mai...{{dropped:5}}
>
> __
> R-devel@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
> -- 
>
>
> If people do not believe that mathematics is simple, it is
> only because they do not realize how complicated life is.
>
> John von Neumann 
> 
>

 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:9}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux
hopefully you'll be able to create a reproducible example, as my hanging 
issues seem to come and go without any obvious reason.


On 17/08/2011 13:50, Tim Triche, Jr. wrote:
> I'll see if I can put together a self-contained example.  Primarily, 
> the times that I use multicore (and attempted to use doSMP, mostly 
> because one of my users refuses to ditch Windows) are when I am 
> reading a ton of binary files, none of which depend on the others. 
>  This is a blindingly obvious use-case for e.g. doMC and doSMP, yet 
> what typically happens is that the entire operation wedges.  I'm told 
> that doSMP really only works well with Revolution R, but per above, I 
> will try to put together a working self-contained example to show how.
>
>
> On Wed, Aug 17, 2011 at 4:39 AM, Renaud Gaujoux  > wrote:
>
> I did notice some strange behaviour once, but things were working
> without a problem for a while now.
> foreach loops and concepts are nice concepts, which I'd like to
> carry on using.
> Maybe somebody from the foreach-dopar packages could explain why
> they have such issues?
>
> Tim do you have concrete examples of the issues you talked with
> other developers?
>
> Thank you.
>
> Renaud
>
> On 17/08/2011 12:45, Tim Triche, Jr. wrote:
>> yes -- doMC (and doSMP) are kind of bogus and I just use
>> multicore::mclapply() to good effect these days.  I checked
>> around with other people doing similar things at the Bioconductor
>> developer day and pretty much everyone confirmed that doWHATEVER
>> seems to have issues, where as multicore itself... doesn't.
>>
>> Just my $0.02,
>>
>> --t
>>
>>
>> On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux
>> > > wrote:
>>
>> Hi,
>>
>> in R 2.12.1, R CMD check hangs when building a vignette that
>> uses a foreach loop with the doMC parallel backend.
>> This does not happen in R 2.13.1, nor if I use doSEQ instead
>> of doMC.
>> All versions of multicore, doMC and foreach are the same on
>> both my R installations.
>>
>> Has anybody encountered a similar issue?
>>
>> Thank you.
>> Renaud
>>
>>
>>
>>
>> ###
>> UNIVERSITY OF CAPE TOWN
>> This e-mail is subject to the UCT ICT policies and
>> e-mai...{{dropped:5}}
>>
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>>
>> -- 
>>
>>
>> If people do not believe that mathematics is simple, it
>> is only because they do not realize how complicated life is.
>>
>> John von Neumann
>> 
>> 
>>
>
> ###
>
> UNIVERSITY OF CAPE TOWN
>
> This e-mail is subject to the UCT ICT policies and e-mail
> disclaimer published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable
> from +27 21 650 9111 . This e-mail is
> intended only for the person(s) to whom it is addressed. If the
> e-mail has reached you in error, please notify the author. If you
> are not the intended recipient of the e-mail you may not use,
> disclose, copy, redirect or print the content. If this e-mail is
> not related to the business of UCT it is sent by the sender in the
> sender's individual capacity.
>
> ###
>
>
>
>
> -- 
> When you emerge in a few years, you can ask someone what you missed, 
> and you'll find it can be summed up in a few minutes.
>
> Derek Sivers 
>

 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:9}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Tim Triche, Jr.
I'll see if I can put together a self-contained example.  Primarily, the
times that I use multicore (and attempted to use doSMP, mostly because one
of my users refuses to ditch Windows) are when I am reading a ton of binary
files, none of which depend on the others.  This is a blindingly obvious
use-case for e.g. doMC and doSMP, yet what typically happens is that the
entire operation wedges.  I'm told that doSMP really only works well with
Revolution R, but per above, I will try to put together a working
self-contained example to show how.


On Wed, Aug 17, 2011 at 4:39 AM, Renaud Gaujoux wrote:

> **
> I did notice some strange behaviour once, but things were working without a
> problem for a while now.
> foreach loops and concepts are nice concepts, which I'd like to carry on
> using.
> Maybe somebody from the foreach-dopar packages could explain why they have
> such issues?
>
> Tim do you have concrete examples of the issues you talked with other
> developers?
>
> Thank you.
>
> Renaud
>
> On 17/08/2011 12:45, Tim Triche, Jr. wrote:
>
> yes -- doMC (and doSMP) are kind of bogus and I just use
> multicore::mclapply() to good effect these days.  I checked around with
> other people doing similar things at the Bioconductor developer day and
> pretty much everyone confirmed that doWHATEVER seems to have issues, where
> as multicore itself... doesn't.
>
>  Just my $0.02,
>
>  --t
>
>
> On Wed, Aug 17, 2011 at 1:45 AM, Renaud Gaujoux <
> ren...@mancala.cbio.uct.ac.za> wrote:
>
>> Hi,
>>
>> in R 2.12.1, R CMD check hangs when building a vignette that uses a
>> foreach loop with the doMC parallel backend.
>> This does not happen in R 2.13.1, nor if I use doSEQ instead of doMC.
>> All versions of multicore, doMC and foreach are the same on both my R
>> installations.
>>
>> Has anybody encountered a similar issue?
>>
>> Thank you.
>> Renaud
>>
>>
>>
>>
>> ###
>> UNIVERSITY OF CAPE TOWN
>> This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
>  --
>  If people do not believe that mathematics is simple, it is only because
> they do not realize how complicated life is.
> John von 
> Neumann
>
>
>  ###
>
> UNIVERSITY OF CAPE TOWN
>
>  This e-mail is subject to the UCT ICT policies and e-mail disclaimer
> published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
> 21 650 9111. This e-mail is intended only for the person(s) to whom it is
> addressed. If the e-mail has reached you in error, please notify the author.
> If you are not the intended recipient of the e-mail you may not use,
> disclose, copy, redirect or print the content. If this e-mail is not related
> to the business of UCT it is sent by the sender in the sender's individual
> capacity.
>
>  ###
>



-- 
When you emerge in a few years, you can ask someone what you missed, and
you'll find it can be summed up in a few minutes.

Derek Sivers 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Kasper Daniel Hansen
On Tue, Aug 16, 2011 at 11:17 PM, Jonathan Malmaud  wrote:
> Hi,
> My R package has files in the 'inst' directory that it needs to reference. 
> How can the R scripts in my package find out the full path to the 'inst' 
> directory after the package is installed, given that different users may have 
> installed the package to different libraries?


The inst directory does not exists after installation (this is
described in R-exts).  Use something like
   system.file("extdata", package = "MyPackage")
to locate a directory etc.

Kasper



>
> Thanks,
> Jon Malmaud
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Marc Schwartz
On Aug 16, 2011, at 10:17 PM, Jonathan Malmaud wrote:

> Hi,
> My R package has files in the 'inst' directory that it needs to reference. 
> How can the R scripts in my package find out the full path to the 'inst' 
> directory after the package is installed, given that different users may have 
> installed the package to different libraries? 
> 
> Thanks,
> Jon Malmaud


See ?path.package and ?file.path

Example:

> require(WriteXLS)
Loading required package: WriteXLS

# I am on OSX
> path.package("WriteXLS")
[1] "/Library/Frameworks/R.framework/Versions/2.13/Resources/library/WriteXLS"

I have Perl scripts in my package, which are in the /inst/Perl folder in the 
package source, so:

> file.path(path.package("WriteXLS"), "Perl/WriteXLS.pl")
[1] 
"/Library/Frameworks/R.framework/Versions/2.13/Resources/library/WriteXLS/Perl/WriteXLS.pl"


HTH,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Dirk Eddelbuettel

On 16 August 2011 at 23:17, Jonathan Malmaud wrote:
| Hi,
| My R package has files in the 'inst' directory that it needs to reference. 
How can the R scripts in my package find out the full path to the 'inst' 
directory after the package is installed, given that different users may have 
installed the package to different libraries? 

It is slightly different:  files and directories below the inst/ directory in
the _sources_ will be installed in the toplevel diretory of the _installed 
package_.

You can then use system.file() to get the location at run-time for the
installed package. E.g. to get the example file 'fib.r' from the Fibonacci
directory within the examples of Rcpp of my installed version:

R> system.file(package="Rcpp", "examples", "Fibonacci", "fib.r")
[1] "/usr/local/lib/R/site-library/Rcpp/examples/Fibonacci/fib.r"

The result of that system.file() call could now be fed to source() etc. 

system.file() can be used for other files within the package too. 'Writing R
Extensions' details what other files are installed by default -- but as
stated, everything below inst/ gets copied as is, without the layer of inst/
itself.

Hope this helps,  Dirk

-- 
Two new Rcpp master classes for R and C++ integration scheduled for 
New York (Sep 24) and San Francisco (Oct 8), more details are at
http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Brian G. Peterson
On Wed, 2011-08-17 at 04:50 -0700, Tim Triche, Jr. wrote:
> I'll see if I can put together a self-contained example.  Primarily,
> the times that I use multicore (and attempted to use doSMP, mostly
> because one of my users refuses to ditch Windows) are when I am
> reading a ton of binary files, none of which depend on the others.
> This is a blindingly obvious use-case for e.g. doMC and doSMP, yet
> what typically happens is that the entire operation wedges.  I'm told
> that doSMP really only works well with Revolution R, but per above, I
> will try to put together a working self-contained example to show
> how. 

Remember that physical I/O can bind up the processes too.  Having a
bunch of processes all trying to read from local disk at the same time
(especially when they are all trying to read the same file(s), a problem
it seems you may not have) is a recipe for I/O locks that can seize up
your processes.

So, if the problem only occurs with physical I/O, the first thing I
would try is to move that storage to a storage device on another machine
that is tuned for that level of disk I/O.

Regards,

   - Brian

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R cmd check and multicore foreach loop

2011-08-17 Thread Renaud Gaujoux

Uhm... maybe this is the issue.
The issue seems to specially occur when I am building the vignette, 
which performs some parallel computations on a reduced example, 
therefore faster than in a normal usage of the package.
The two processes (on my dual core) output some logging information 
using cat, which are normally sent to the console, but I guess that in 
the case of a vignette these are written to tex file. It is very few 
data though (a loop counter), so writing should be also very quick, even 
in a file.


Could it be possible that a I/O deadlock happens because of this output?
I actually use mutexes, at the end of each loop to perform bigger 
writing operation on a common file, but I hadn't think these would be 
required for the logging output,  assuming that stdout and stderr were 
thread safe. Maybe I should use mutexes at this level too.
Does multicore or doMC provide optional separate logging as doMPI does? 
(I guess this might be better posted to R-hpc)


Thank you.
Renaud



On 17/08/2011 14:56, Brian G. Peterson wrote:

On Wed, 2011-08-17 at 04:50 -0700, Tim Triche, Jr. wrote:

I'll see if I can put together a self-contained example.  Primarily,
the times that I use multicore (and attempted to use doSMP, mostly
because one of my users refuses to ditch Windows) are when I am
reading a ton of binary files, none of which depend on the others.
This is a blindingly obvious use-case for e.g. doMC and doSMP, yet
what typically happens is that the entire operation wedges.  I'm told
that doSMP really only works well with Revolution R, but per above, I
will try to put together a working self-contained example to show
how.

Remember that physical I/O can bind up the processes too.  Having a
bunch of processes all trying to read from local disk at the same time
(especially when they are all trying to read the same file(s), a problem
it seems you may not have) is a recipe for I/O locks that can seize up
your processes.

So, if the problem only occurs with physical I/O, the first thing I
would try is to move that storage to a storage device on another machine
that is tuned for that level of disk I/O.

Regards,

- Brian





###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Referencing 'inst' directory in installed package

2011-08-17 Thread Mehmet Suzen
You can use
system.file(package="your_package_name")
which will return you library directory.


-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Jonathan Malmaud
Sent: 17 August 2011 04:18
To: r-devel@r-project.org
Subject: [Rd] Referencing 'inst' directory in installed package

Hi,
My R package has files in the 'inst' directory that it needs to
reference. How can the R scripts in my package find out the full path to
the 'inst' directory after the package is installed, given that
different users may have installed the package to different libraries? 

Thanks,
Jon Malmaud
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
LEGAL NOTICE
This message is intended for the use o...{{dropped:10}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] getNativeSymbolInfo("user_unif_rand") returns different results on windows and linux

2011-08-17 Thread Renaud Gaujoux

Hi,

When loading a package that provides the user-supplied RNG hook 
user_unif_rand, calling getNativeSymbolInfo("user_unif_rand") returns 
informations about the loaded symbol.

I am using this to identify which package currently provides the RNG hook.
The results are the same on windows and linux if only one library 
provides the hook.


If one loads a second package that provides this hook, then on linux 
(Ubuntu 10.10), calling again getNativeSymbolInfo("user_unif_rand") 
returns the latest symbol information (which I presume is the correct 
result).
On windows (XP and Vista) however the symbol information does not change 
(same pointer address) and the package data is NULL.
See results for both systems below. I tested this with R 2.12.1, 2.13.0 
and 2.13.1 on Windows.


Is this a normal behaviour for dll loaded on Windows?
Thank you.

Renaud


#
# LINUX
#
> library(rlecuyer)
> getNativeSymbolInfo("user_unif_rand")
$name
[1] "user_unif_rand"

$address

attr(,"class")
[1] "NativeSymbol"

$package
DLL name: rlecuyer
Filename: 
/home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rlecuyer/libs/rlecuyer.so

Dynamic lookup: TRUE

attr(,"class")
[1] "NativeSymbolInfo"
> library(rstream)
> getNativeSymbolInfo("user_unif_rand")
$name
[1] "user_unif_rand"

$address

attr(,"class")
[1] "NativeSymbol"

$package
DLL name: rstream
Filename: 
/home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rstream/libs/rstream.so

Dynamic lookup: TRUE

attr(,"class")
[1] "NativeSymbolInfo"

#
# WINDOWS
#
> library(rlecuyer)
> getNativeSymbolInfo("user_unif_rand")
$name
[1] "user_unif_rand"

$address

attr(,"class")
[1] "NativeSymbol"

$package
DLL name: rlecuyer
Filename: C:/Program
Files/R/R-2.13.1/library/rlecuyer/libs/i386/rlecuyer.dll
Dynamic lookup: TRUE

attr(,"class")
[1] "NativeSymbolInfo"
> library(rstream)
> getNativeSymbolInfo("user_unif_rand")
$name
[1] "user_unif_rand"

$address

attr(,"class")
[1] "NativeSymbol"

$package
NULL

attr(,"class")
[1] "NativeSymbolInfo"








###
UNIVERSITY OF CAPE TOWN 


This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread cberry
John C Nash  writes:

> This message is about a curious difference in timing between two ways of 
> computing the
> same function. One uses expm, so is expected to be a bit slower, but "a bit" 
> turned out to
> be a factor of >1000. The code is below. We would be grateful if anyone can 
> point out any
> egregious bad practice in our code, or enlighten us on why one approach is so 
> much slower
> than the other. 

Looks like A*t in expm(A*t) is a "matrix".

'getMethod("expm","matrix")' coerces it arg to a "Matrix", then calls
expm(), whose method coerces its arg to a "dMatrix" and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
"dgeMatrix" ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




> The problem arose in an activity to develop guidelines for nonlinear
> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
> and we will be
> trying to include suggestions of how to prepare problems like this for 
> efficient and
> effective solution. The code for nlogL was the "original" from the worker who 
> supplied the
> problem.
>
> Best,
>
> John Nash
>
> --
>
> cat("mineral-timing.R == benchmark MIN functions in R\n")
> #  J C Nash July 31, 2011
>
> require("microbenchmark")
> require("numDeriv")
> library(Matrix)
> library(optimx)
> # dat<-read.table('min.dat', skip=3, header=FALSE)
> # t<-dat[,1]
> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
>  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
>  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
>
> # y<-dat[,2] # ?? tidy up
>  y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
> 12.699,
> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
> 14.351,
> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
>
>
> ones<-rep(1,length(t))
> theta<-c(-2,-2,-2,-2)
>
>
> nlogL<-function(theta){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(t)100-sum(expm(A*t)%*%x0)
>   pred<-sapply(dat[,1],sol)
>   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
>
> getpred<-function(theta, t){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(tt)100-sum(expm(A*tt)%*%x0)
>   pred<-sapply(t,sol)
> }
>
> Mpred <- function(theta) {
> # WARNING: assumes t global
> kvec<-exp(theta[1:3])
> k1<-kvec[1]
> k2<-kvec[2]
> k3<-kvec[3]
> #   MIN problem terbuthylazene disappearance
> z<-k1+k2+k3
> y<-z*z-4*k1*k3
> l1<-0.5*(-z+sqrt(y))
> l2<-0.5*(-z-sqrt(y))
> val<-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
> } # val should be a vector if t is a vector
>
> negll <- function(theta){
> # non expm version JN 110731
>   pred<-Mpred(theta)
>   sigma<-exp(theta[4])
>   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
>
> theta<-rep(-2,4)
> fand<-nlogL(theta)
> fsim<-negll(theta)
> cat("Check fn vals: expm =",fand,"   simple=",fsim,"  diff=",fand-fsim,"\n")
>
> cat("time the function in expm form\n")
> tnlogL<-microbenchmark(nlogL(theta), times=100L)
> tnlogL
>
> cat("time the function in simpler form\n")
> tnegll<-microbenchmark(negll(theta), times=100L)
> tnegll
>
> ftimes<-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
> # ftimes
>
>
> boxplot(log(ftimes))
> title("Log times in nanoseconds for matrix exponential and simple MIN fn")
>

-- 
Charles C. Berry cbe...@tajo.ucsd.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Ravi Varadhan
Yes, the culprit is the evaluation of expm(A*t).  This is a lazy way of solving 
the system of ODEs, where you save analytic effort, but you pay for it dearly 
in terms of computational effort!

Even in this lazy approach, I am sure there ought to be faster ways to 
evaluating exponent of a matrix than that in "Matrix" package.

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of cbe...@tajo.ucsd.edu
Sent: Wednesday, August 17, 2011 1:08 PM
To: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] An example of very slow computation

John C Nash  writes:

> This message is about a curious difference in timing between two ways of 
> computing the
> same function. One uses expm, so is expected to be a bit slower, but "a bit" 
> turned out to
> be a factor of >1000. The code is below. We would be grateful if anyone can 
> point out any
> egregious bad practice in our code, or enlighten us on why one approach is so 
> much slower
> than the other. 

Looks like A*t in expm(A*t) is a "matrix".

'getMethod("expm","matrix")' coerces it arg to a "Matrix", then calls
expm(), whose method coerces its arg to a "dMatrix" and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
"dgeMatrix" ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




> The problem arose in an activity to develop guidelines for nonlinear
> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
> and we will be
> trying to include suggestions of how to prepare problems like this for 
> efficient and
> effective solution. The code for nlogL was the "original" from the worker who 
> supplied the
> problem.
>
> Best,
>
> John Nash
>
> --
>
> cat("mineral-timing.R == benchmark MIN functions in R\n")
> #  J C Nash July 31, 2011
>
> require("microbenchmark")
> require("numDeriv")
> library(Matrix)
> library(optimx)
> # dat<-read.table('min.dat', skip=3, header=FALSE)
> # t<-dat[,1]
> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
>  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
>  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
>
> # y<-dat[,2] # ?? tidy up
>  y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
> 12.699,
> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
> 14.351,
> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
>
>
> ones<-rep(1,length(t))
> theta<-c(-2,-2,-2,-2)
>
>
> nlogL<-function(theta){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(t)100-sum(expm(A*t)%*%x0)
>   pred<-sapply(dat[,1],sol)
>   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
>
> getpred<-function(theta, t){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(tt)100-sum(expm(A*tt)%*%x0)
>   pred<-sapply(t,sol)
> }
>
> Mpred <- function(theta) {
> # WARNING: assumes t global
> kvec<-exp(theta[1:3])
> k1<-kvec[1]
> k2<-kvec[2]
> k3<-kvec[3]
> #   MIN problem terbuthylazene disappearance
> z<-k1+k2+k3
> y<-z*z-4*k1*k3
> l1<-0.5*(-z+sqrt(y))
> l2<-0.5*(-z-sqrt(y))
> val<-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
> } # val should be a vector if t is a vector
>
> negll <- function(theta){
> # non expm version JN 110731
>   pred<-Mpred(theta)
>   sigma<-exp(theta[4])
>   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
>
> theta<-rep(-2,4)
> fand<-nlogL(theta)
> fsim<-negll(theta)
> cat("Check fn vals: expm =",fand,"   simple=",fsim,"  diff=",fand-fsim,"\n")
>
> cat("time the function in expm form\n")
> tnlogL<-microbenchmark(nlogL(theta), times=100L)
> tnlogL
>
> cat("time the function in simpler form\n")
> tnegll<-microbenchmark(negll(theta), times=100L)
> tnegll
>
> ftimes<-data.frame(texpm=tnlogL$time, tsimp=tnegll$time)
> # ftimes
>
>
> boxplot(log(ftimes))
> title("Log times in nanoseconds for matrix exponential and simple MIN fn")
>

-- 
Charles C. Berry cbe...@tajo.ucsd.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getNativeSymbolInfo("user_unif_rand") returns different results on windows and linux

2011-08-17 Thread Duncan Temple Lang

Hi Renaud

  I cannot presently step through the code on Windows to verify the cause of 
the problem,
but looking at the code, I would _presume_ the reason is that symbol lookup is 
cached on
Windows but not on Linux or OS X (at least by default).
Thus when we perform the second search for user_unif_rand, we find it in the 
cache
rather than searching all the DLLs.

This happens because we did not specify a value for the PACKAGE parameter.

When we load a new DLL, the cache should probably be cleared or we
intelligently update it as necessary for the new DLLs on demand, i.e.
for a new search for a symbol we look in the new DLLs and then use the cache.
(This involves some book-keeping that could become convoluted.)

If we had specified a value for PACKAGE, e.g.

  getNativeSymbolInfo("user_unif_rand", "rstream")

we would get the version in that package's DLL.

So this gives a workaround that you can use with just R code

  findNativeSymbolInfo =
  function(sym,  dlls = rev(getLoadedDLLs()))  {

  for(d in dlls) {
 z = getNativeSymbolInfo(sym, d)
 if(!is.null(z))
  return(z)
  }

NULL
  }

We'll think about whether to change the behaviour on Windows and how to do it
without affecting performance excessively.

 Best,
   D.


On 8/17/11 9:12 AM, Renaud Gaujoux wrote:
> Hi,
> 
> When loading a package that provides the user-supplied RNG hook 
> user_unif_rand, calling
> getNativeSymbolInfo("user_unif_rand") returns informations about the loaded 
> symbol.
> I am using this to identify which package currently provides the RNG hook.
> The results are the same on windows and linux if only one library provides 
> the hook.
> 
> If one loads a second package that provides this hook, then on linux (Ubuntu 
> 10.10), calling again
> getNativeSymbolInfo("user_unif_rand") returns the latest symbol information 
> (which I presume is the correct result).
> On windows (XP and Vista) however the symbol information does not change 
> (same pointer address) and the package data is
> NULL.
> See results for both systems below. I tested this with R 2.12.1, 2.13.0 and 
> 2.13.1 on Windows.
> 
> Is this a normal behaviour for dll loaded on Windows?
> Thank you.
> 
> Renaud
> 
> 
> #
> # LINUX
> #
>> library(rlecuyer)
>> getNativeSymbolInfo("user_unif_rand")
> $name
> [1] "user_unif_rand"
> 
> $address
> 
> attr(,"class")
> [1] "NativeSymbol"
> 
> $package
> DLL name: rlecuyer
> Filename: 
> /home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rlecuyer/libs/rlecuyer.so
> Dynamic lookup: TRUE
> 
> attr(,"class")
> [1] "NativeSymbolInfo"
>> library(rstream)
>> getNativeSymbolInfo("user_unif_rand")
> $name
> [1] "user_unif_rand"
> 
> $address
> 
> attr(,"class")
> [1] "NativeSymbol"
> 
> $package
> DLL name: rstream
> Filename: 
> /home/renaud/R/x86_64-pc-linux-gnu-library/2.12/rstream/libs/rstream.so
> Dynamic lookup: TRUE
> 
> attr(,"class")
> [1] "NativeSymbolInfo"
> 
> #
> # WINDOWS
> #
>> library(rlecuyer)
>> getNativeSymbolInfo("user_unif_rand")
> $name
> [1] "user_unif_rand"
> 
> $address
> 
> attr(,"class")
> [1] "NativeSymbol"
> 
> $package
> DLL name: rlecuyer
> Filename: C:/Program
> Files/R/R-2.13.1/library/rlecuyer/libs/i386/rlecuyer.dll
> Dynamic lookup: TRUE
> 
> attr(,"class")
> [1] "NativeSymbolInfo"
>> library(rstream)
>> getNativeSymbolInfo("user_unif_rand")
> $name
> [1] "user_unif_rand"
> 
> $address
> 
> attr(,"class")
> [1] "NativeSymbol"
> 
> $package
> NULL
> 
> attr(,"class")
> [1] "NativeSymbolInfo"
> 
> 
> 
> 
> 
> 
> 
> 
> ###
> UNIVERSITY OF CAPE TOWN
> This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] An example of very slow computation

2011-08-17 Thread Ravi Varadhan
A principled way to solve this system of ODEs is to use the idea of 
"fundamental matrix", which is the same idea as that of diagonalization of a 
matrix (see Boyce and DiPrima or any ODE text).

Here is the code for that:


nlogL2 <- function(theta){
  k <- exp(theta[1:3])
  sigma <- exp(theta[4])
  A <- rbind(
  c(-k[1], k[2]),
  c( k[1], -(k[2]+k[3]))
  )
eA <- eigen(A)
T <- eA$vectors
r <- eA$values
x0 <- c(0,100)
Tx0 <- T %*% x0

sol <- function(t) 100 - sum(T %*% diag(exp(r*t)) %*% Tx0)
pred <- sapply(dat[,1], sol)
-sum(dnorm(dat[,2], mean=pred, sd=sigma, log=TRUE)) 
}
This is much faster than using expm(A*t), but much slower than "by hand" 
calculations since we have to repeatedly do the diagonalization.  An obvious 
advantage of this fuunction is that it applies to *any* linear system of ODEs 
for which the eigenvalues are real (and negative).

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Ravi Varadhan
Sent: Wednesday, August 17, 2011 2:33 PM
To: 'cbe...@tajo.ucsd.edu'; r-de...@stat.math.ethz.ch; 'nas...@uottawa.ca'
Subject: Re: [Rd] An example of very slow computation

Yes, the culprit is the evaluation of expm(A*t).  This is a lazy way of solving 
the system of ODEs, where you save analytic effort, but you pay for it dearly 
in terms of computational effort!

Even in this lazy approach, I am sure there ought to be faster ways to 
evaluating exponent of a matrix than that in "Matrix" package.

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of cbe...@tajo.ucsd.edu
Sent: Wednesday, August 17, 2011 1:08 PM
To: r-de...@stat.math.ethz.ch
Subject: Re: [Rd] An example of very slow computation

John C Nash  writes:

> This message is about a curious difference in timing between two ways of 
> computing the
> same function. One uses expm, so is expected to be a bit slower, but "a bit" 
> turned out to
> be a factor of >1000. The code is below. We would be grateful if anyone can 
> point out any
> egregious bad practice in our code, or enlighten us on why one approach is so 
> much slower
> than the other. 

Looks like A*t in expm(A*t) is a "matrix".

'getMethod("expm","matrix")' coerces it arg to a "Matrix", then calls
expm(), whose method coerces its arg to a "dMatrix" and calls expm(),
whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
method finally calls '.Call(dgeMatrix_exp, x)' 

Whew!

The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
"dgeMatrix" ))' is a factor of 10 on my box. 

Dunno 'bout the other factor of 100.

Chuck




> The problem arose in an activity to develop guidelines for nonlinear
> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
> and we will be
> trying to include suggestions of how to prepare problems like this for 
> efficient and
> effective solution. The code for nlogL was the "original" from the worker who 
> supplied the
> problem.
>
> Best,
>
> John Nash
>
> --
>
> cat("mineral-timing.R == benchmark MIN functions in R\n")
> #  J C Nash July 31, 2011
>
> require("microbenchmark")
> require("numDeriv")
> library(Matrix)
> library(optimx)
> # dat<-read.table('min.dat', skip=3, header=FALSE)
> # t<-dat[,1]
> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
>  23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
>  191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
>
> # y<-dat[,2] # ?? tidy up
>  y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
> 12.699,
> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
> 14.351,
> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
>
>
> ones<-rep(1,length(t))
> theta<-c(-2,-2,-2,-2)
>
>
> nlogL<-function(theta){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(t)100-sum(expm(A*t)%*%x0)
>   pred<-sapply(dat[,1],sol)
>   -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
> }
>
> getpred<-function(theta, t){
>   k<-exp(theta[1:3])
>   sigma<-exp(theta[4])
>   A<-rbind(
>   c(-k[1], k[2]),
>   c( k[1], -(k[2]+k[3]))
>   )
>   x0<-c(0,100)
>   sol<-function(tt)100-sum(expm(A*tt)%*%x0)
>   pred<-sapply(t,sol)
> }
>
> Mpred <- funct

Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann

On 17 Aug 2011, at 7:08PM,   wrote:

> John C Nash  writes:
> 
>> This message is about a curious difference in timing between two ways of 
>> computing the
>> same function. One uses expm, so is expected to be a bit slower, but "a bit" 
>> turned out to
>> be a factor of >1000. 
> 
> Looks like A*t in expm(A*t) is a "matrix".
> 
> 'getMethod("expm","matrix")' coerces it arg to a "Matrix", then calls
> expm(), whose method coerces its arg to a "dMatrix" and calls expm(),
> whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
> method finally calls '.Call(dgeMatrix_exp, x)' 
> 
> Whew!
> 
> The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
> "dgeMatrix" ))' is a factor of 10 on my box. 
> 

You are right! 

I was testing running nlogL below 100 times. expm() is then called 2500 times.
The total running time on my machine was 13 seconds.

If you replace in nlogL the part:
--
 A<-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 x0<-c(0,100)
 sol<-function(t)100-sum(expm(A*t)%*%x0)
--
with:
--
 A<-rbind(
 c(-k[1], k[2]),
 c( k[1], -(k[2]+k[3]))
 )
 A<-as(A,"dgeMatrix")  # <--- this is the difference
  sol<-function(t)100-sum(expm(A*t)%*%x0)
--

this time drops to 1.5 seconds (!).

At that point, expm() takes up much less time than, for example, calculating 
A*t in sol(), and the sum() - I think because conversions have to be done.

Thus, if m is a 2x2 dgeMatrix, then 
> system.time({for(i in 1:2500) m*3.2})
   user  system elapsed 
  0.425   0.004   0.579 
(i.e. 1/3 of the total time for nlogL() above)

whereas if mm=as.matrix(m), then
> system.time({for(i in 1:2500) mm*3.2})
   user  system elapsed 
  0.004   0.000   0.005 

(!!)

and, similarly:
--
> system.time({for(i in 1:2500) sum(m)})
   user  system elapsed 
  0.399   0.002   0.494 
> system.time({for(i in 1:2500) sum(mm)})
   user  system elapsed 
  0.002   0.000   0.028 
--
whereas
> system.time({for(i in 1:2500) expm(m)})
   user  system elapsed 
  0.106   0.001   0.118 

to sum it up, of 13 seconds, 11.5 were spent on conversions to dgeMatrix
0.5 are spent on multiplying a dgeMatrix by a double
0.5 are spent on summing a dgeMatrix
and 0.1 are actually spent in expm.

Michael

P.S. You could have used Rprof() to see these times, only that interpreting 
summaryRprof() is a bit hard. (Is there something that does summaryRprof(), but 
also shows the call graph?)





> Dunno 'bout the other factor of 100.
> 
> Chuck
> 
> 
> 
> 
>> The problem arose in an activity to develop guidelines for nonlinear
>> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
>> and we will be
>> trying to include suggestions of how to prepare problems like this for 
>> efficient and
>> effective solution. The code for nlogL was the "original" from the worker 
>> who supplied the
>> problem.
>> 
>> Best,
>> 
>> John Nash
>> 
>> --
>> 
>> cat("mineral-timing.R == benchmark MIN functions in R\n")
>> #  J C Nash July 31, 2011
>> 
>> require("microbenchmark")
>> require("numDeriv")
>> library(Matrix)
>> library(optimx)
>> # dat<-read.table('min.dat', skip=3, header=FALSE)
>> # t<-dat[,1]
>> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
>> 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
>> 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
>> 
>> # y<-dat[,2] # ?? tidy up
>> y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
>> 12.699,
>> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
>> 14.351,
>> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
>> 
>> 
>> ones<-rep(1,length(t))
>> theta<-c(-2,-2,-2,-2)
>> 
>> 
>> nlogL<-function(theta){
>>  k<-exp(theta[1:3])
>>  sigma<-exp(theta[4])
>>  A<-rbind(
>>  c(-k[1], k[2]),
>>  c( k[1], -(k[2]+k[3]))
>>  )
>>  x0<-c(0,100)
>>  sol<-function(t)100-sum(expm(A*t)%*%x0)
>>  pred<-sapply(dat[,1],sol)
>>  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
>> }
>> 
>> getpred<-function(theta, t){
>>  k<-exp(theta[1:3])
>>  sigma<-exp(theta[4])
>>  A<-rbind(
>>  c(-k[1], k[2]),
>>  c( k[1], -(k[2]+k[3]))
>>  )
>>  x0<-c(0,100)
>>  sol<-function(tt)100-sum(expm(A*tt)%*%x0)
>>  pred<-sapply(t,sol)
>> }
>> 
>> Mpred <- function(theta) {
>> # WARNING: assumes t global
>> kvec<-exp(theta[1:3])
>> k1<-kvec[1]
>> k2<-kvec[2]
>> k3<-kvec[3]
>> #   MIN problem terbuthylazene disappearance
>>z<-k1+k2+k3
>>y<-z*z-4*k1*k3
>>l1<-0.5*(-z+sqrt(y))
>>l2<-0.5*(-z-sqrt(y))
>>val<-100*(1-((k1+k2+l2)*exp(l2*t)-(k1+k2+l1)*exp(l1*t))/(l2-l1))
>> } # val should be a vector if t is a vector
>> 
>> negll <- function(theta){
>> # non expm version JN 110731
>>  pred<-Mpred(theta)
>>  sigma<-exp(theta[4])
>>  -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
>> }
>> 
>> theta<-rep(-2,4)
>> fand<-nlogL(theta)
>> fsim<-negll(theta)
>> cat("Check fn vals: expm =",fand,"   simple=",fsim,"  diff=",fand-fsim,

Re: [Rd] An example of very slow computation

2011-08-17 Thread Michael Lachmann
Just a small addition:

If you replace below
> sol<-function(t)100-sum(expm(A*t)%*%x0)
by
sol<-function(t){A@x=A@x*t;100-sum(expm(A)@x * x0)}

(ugly! But avoiding the conversions and generics)

The time on my machine drop further down to 0.3 seconds. (from the original 13 
seconds, and then from the 1.5 seconds change mentioned below.

So, overall a ~40 fold improvement, though on my machine, the initial ratio was 
~3200 times slower, so a ~80 fold slowdown is still present.

Michael

On 17 Aug 2011, at 11:27PM, Michael Lachmann wrote:

> 
> On 17 Aug 2011, at 7:08PM,   
> wrote:
> 
>> John C Nash  writes:
>> 
>>> This message is about a curious difference in timing between two ways of 
>>> computing the
>>> same function. One uses expm, so is expected to be a bit slower, but "a 
>>> bit" turned out to
>>> be a factor of >1000. 
>> 
>> Looks like A*t in expm(A*t) is a "matrix".
>> 
>> 'getMethod("expm","matrix")' coerces it arg to a "Matrix", then calls
>> expm(), whose method coerces its arg to a "dMatrix" and calls expm(),
>> whose method coerces its arg to a 'dgeMatrix' and calls expm(), whose
>> method finally calls '.Call(dgeMatrix_exp, x)' 
>> 
>> Whew!
>> 
>> The time difference between 'expm( diag(10)+1 )' and 'expm( as( diag(10)+1,
>> "dgeMatrix" ))' is a factor of 10 on my box. 
>> 
> 
> You are right! 
> 
> I was testing running nlogL below 100 times. expm() is then called 2500 times.
> The total running time on my machine was 13 seconds.
> 
> If you replace in nlogL the part:
> --
> A<-rbind(
> c(-k[1], k[2]),
> c( k[1], -(k[2]+k[3]))
> )
> x0<-c(0,100)
> sol<-function(t)100-sum(expm(A*t)%*%x0)
> --
> with:
> --
> A<-rbind(
> c(-k[1], k[2]),
> c( k[1], -(k[2]+k[3]))
> )
> A<-as(A,"dgeMatrix")  # <--- this is the difference
>  sol<-function(t)100-sum(expm(A*t)%*%x0)
> --
> 
> this time drops to 1.5 seconds (!).
> 
> At that point, expm() takes up much less time than, for example, calculating 
> A*t in sol(), and the sum() - I think because conversions have to be done.
> 
> Thus, if m is a 2x2 dgeMatrix, then 
>> system.time({for(i in 1:2500) m*3.2})
>   user  system elapsed 
>  0.425   0.004   0.579 
> (i.e. 1/3 of the total time for nlogL() above)
> 
> whereas if mm=as.matrix(m), then
>> system.time({for(i in 1:2500) mm*3.2})
>   user  system elapsed 
>  0.004   0.000   0.005 
> 
> (!!)
> 
> and, similarly:
> --
>> system.time({for(i in 1:2500) sum(m)})
>   user  system elapsed 
>  0.399   0.002   0.494 
>> system.time({for(i in 1:2500) sum(mm)})
>   user  system elapsed 
>  0.002   0.000   0.028 
> --
> whereas
>> system.time({for(i in 1:2500) expm(m)})
>   user  system elapsed 
>  0.106   0.001   0.118 
> 
> to sum it up, of 13 seconds, 11.5 were spent on conversions to dgeMatrix
> 0.5 are spent on multiplying a dgeMatrix by a double
> 0.5 are spent on summing a dgeMatrix
> and 0.1 are actually spent in expm.
> 
> Michael
> 
> P.S. You could have used Rprof() to see these times, only that interpreting 
> summaryRprof() is a bit hard. (Is there something that does summaryRprof(), 
> but also shows the call graph?)
> 
> 
> 
> 
> 
>> Dunno 'bout the other factor of 100.
>> 
>> Chuck
>> 
>> 
>> 
>> 
>>> The problem arose in an activity to develop guidelines for nonlinear
>>> modeling in ecology (at NCEAS, Santa Barbara, with worldwide participants), 
>>> and we will be
>>> trying to include suggestions of how to prepare problems like this for 
>>> efficient and
>>> effective solution. The code for nlogL was the "original" from the worker 
>>> who supplied the
>>> problem.
>>> 
>>> Best,
>>> 
>>> John Nash
>>> 
>>> --
>>> 
>>> cat("mineral-timing.R == benchmark MIN functions in R\n")
>>> #  J C Nash July 31, 2011
>>> 
>>> require("microbenchmark")
>>> require("numDeriv")
>>> library(Matrix)
>>> library(optimx)
>>> # dat<-read.table('min.dat', skip=3, header=FALSE)
>>> # t<-dat[,1]
>>> t <- c(0.77,  1.69,  2.69,  3.67,  4.69,  5.71,  7.94,  9.67, 11.77, 17.77,
>>> 23.77, 32.77, 40.73, 47.75, 54.90, 62.81, 72.88, 98.77, 125.92, 160.19,
>>> 191.15, 223.78, 287.70, 340.01, 340.95, 342.01)
>>> 
>>> # y<-dat[,2] # ?? tidy up
>>> y<- c(1.396, 3.784, 5.948, 7.717, 9.077, 10.100, 11.263, 11.856, 12.251, 
>>> 12.699,
>>> 12.869, 13.048, 13.222, 13.347, 13.507, 13.628, 13.804, 14.087, 14.185, 
>>> 14.351,
>>> 14.458, 14.756, 15.262, 15.703, 15.703, 15.703)
>>> 
>>> 
>>> ones<-rep(1,length(t))
>>> theta<-c(-2,-2,-2,-2)
>>> 
>>> 
>>> nlogL<-function(theta){
>>> k<-exp(theta[1:3])
>>> sigma<-exp(theta[4])
>>> A<-rbind(
>>> c(-k[1], k[2]),
>>> c( k[1], -(k[2]+k[3]))
>>> )
>>> x0<-c(0,100)
>>> sol<-function(t)100-sum(expm(A*t)%*%x0)
>>> pred<-sapply(dat[,1],sol)
>>> -sum(dnorm(dat[,2],mean=pred,sd=sigma, log=TRUE))
>>> }
>>> 
>>> getpred<-function(theta, t){
>>> k<-exp(theta[1:3])
>>> sigma<-exp(theta[4])
>>> A<-rbind(
>>> c(-k[1], k[2]),
>>> c( k[1], -(k[2]+k[3]))
>>> )
>>> x0<-c(0,100)
>>> sol<-function(tt)100-sum(expm(A*tt)%*%x