On Thu, Dec 8, 2016 at 3:59 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpno...@american.edu> wrote:
>>
>> Dear All,
>>
>> I regularly want to "apply" some function to an array in a way that the 
>> arguments to the user function depend on the index on which the apply is 
>> working.  A simple example is:
>>
>> A <- array( runif(160), dim=c(5,4,8) )
>> x <- matrix( runif(32), nrow=4, ncol=8 )
>> b <- runif(8)
>> f1 <- function( A, x, b ) { sum( A %*% x ) + b }
>> result <- rep(0.0,8)
>> for (i in 1:8) {
>>  result[i] <- f1( A[,,i], x[,i] , b[i] )
>> }
>>
>> This works, but is slow.  I'd like to be able to do something like:
>>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), 
>> list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] 
>> and b[i] to FUN in addition to A[,,i].
>>
>> Does such a generalized.apply already exist somewhere?  While I can write a 
>> C function to do a particular case, it would be nice if there was a fast, 
>> general way to do this.
>
> I would have thought that this would achieve the same result:
>
> result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )
>
> Or:
>
> result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , 
> b[i] )} )
>
> (I doubt it will be any faster, but if 'i' is large, parallelism might help. 
> The inner function appears to be fairly efficient.)

You're right, it's slower.  Despite how often it's repeated that
"loops in R are slow", they're not *that* slow.  They're often faster
than the *apply functions, especially if they have been "compiled" by
compiler::cmpfun().

You really need to know *why* code is slow before trying to make it
faster.  I profiled an example that would have a loop with 1e6
iterations and 80%+ of the time was still spent inside f1().

set.seed(21)
nc <- 1e6
nr <- 10
A <- array( runif(5*nr*nc), dim=c(5,nr,nc) )
x <- matrix( runif(nr*nc), nrow=nr, ncol=nc )
b <- runif(nc)
f1 <- compiler::cmpfun(function( A, x, b ) { sum( A %*% x ) + b })
f2 <- compiler::cmpfun({
  function(A, x, b, FUN) {
    result <- numeric(length(b))
    for (i in seq_along(b)) {
      result[i] <- FUN( A[,,i], x[,i] , b[i] )
    }
    return(result)
  }
})
Rprof(interval=0.01)
result <- f2(A,x,b,f1)
Rprof(NULL)
summaryRprof()

$by.self
      self.time self.pct total.time total.pct
"FUN"      4.29    84.28       4.76     93.52
"%*%"      0.47     9.23       0.47      9.23
"f2"       0.33     6.48       5.09    100.00

$by.total
      total.time total.pct self.time self.pct
"f2"        5.09    100.00      0.33     6.48
"FUN"       4.76     93.52      4.29    84.28
"%*%"       0.47      9.23      0.47     9.23

$sample.interval
[1] 0.01

$sampling.time
[1] 5.09

In this case, almost all the time is spent evaluating f1() ("FUN"),
even after calling compiler::cmpfun on f1() and on a function
containing the loop.  Making the looping construct faster is not going
to improve the performance of this code by a significant amount.
I.e., dropping to compiled code will only help if you avoid the R
function call, but then that's not a general solution...

> --
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2016 | www.rinfinance.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to