On Thu, Dec 8, 2016 at 3:59 PM, David Winsemius <[email protected]> wrote:
>
>> On Dec 8, 2016, at 12:09 PM, John P. Nolan <[email protected]> wrote:
>>
>> Dear All,
>>
>> I regularly want to "apply" some function to an array in a way that the
>> arguments to the user function depend on the index on which the apply is
>> working. A simple example is:
>>
>> A <- array( runif(160), dim=c(5,4,8) )
>> x <- matrix( runif(32), nrow=4, ncol=8 )
>> b <- runif(8)
>> f1 <- function( A, x, b ) { sum( A %*% x ) + b }
>> result <- rep(0.0,8)
>> for (i in 1:8) {
>> result[i] <- f1( A[,,i], x[,i] , b[i] )
>> }
>>
>> This works, but is slow. I'd like to be able to do something like:
>> generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2),
>> list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i]
>> and b[i] to FUN in addition to A[,,i].
>>
>> Does such a generalized.apply already exist somewhere? While I can write a
>> C function to do a particular case, it would be nice if there was a fast,
>> general way to do this.
>
> I would have thought that this would achieve the same result:
>
> result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )
>
> Or:
>
> result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] ,
> b[i] )} )
>
> (I doubt it will be any faster, but if 'i' is large, parallelism might help.
> The inner function appears to be fairly efficient.)
You're right, it's slower. Despite how often it's repeated that
"loops in R are slow", they're not *that* slow. They're often faster
than the *apply functions, especially if they have been "compiled" by
compiler::cmpfun().
You really need to know *why* code is slow before trying to make it
faster. I profiled an example that would have a loop with 1e6
iterations and 80%+ of the time was still spent inside f1().
set.seed(21)
nc <- 1e6
nr <- 10
A <- array( runif(5*nr*nc), dim=c(5,nr,nc) )
x <- matrix( runif(nr*nc), nrow=nr, ncol=nc )
b <- runif(nc)
f1 <- compiler::cmpfun(function( A, x, b ) { sum( A %*% x ) + b })
f2 <- compiler::cmpfun({
function(A, x, b, FUN) {
result <- numeric(length(b))
for (i in seq_along(b)) {
result[i] <- FUN( A[,,i], x[,i] , b[i] )
}
return(result)
}
})
Rprof(interval=0.01)
result <- f2(A,x,b,f1)
Rprof(NULL)
summaryRprof()
$by.self
self.time self.pct total.time total.pct
"FUN" 4.29 84.28 4.76 93.52
"%*%" 0.47 9.23 0.47 9.23
"f2" 0.33 6.48 5.09 100.00
$by.total
total.time total.pct self.time self.pct
"f2" 5.09 100.00 0.33 6.48
"FUN" 4.76 93.52 4.29 84.28
"%*%" 0.47 9.23 0.47 9.23
$sample.interval
[1] 0.01
$sampling.time
[1] 5.09
In this case, almost all the time is spent evaluating f1() ("FUN"),
even after calling compiler::cmpfun on f1() and on a function
containing the loop. Making the looping construct faster is not going
to improve the performance of this code by a significant amount.
I.e., dropping to compiled code will only help if you avoid the R
function call, but then that's not a general solution...
> --
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Joshua Ulrich | about.me/joshuaulrich
FOSS Trading | www.fosstrading.com
R/Finance 2016 | www.rinfinance.com
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel