On Sat, Mar 10, 2012 at 12:11 PM, Ben quant <ccqu...@gmail.com> wrote: > Very interesting. You are doing some stuff here that I have never seen.
and that I would not typically do or recommend (e.g., fussing with storage mode or manually setting the dimensions of an object), but that can be faster by sacrificing higher level functions flexibility for lower level, more direct control. > Thank you. I will test it on my real data on Monday and let you know what I > find. That cmpfun function looks very useful! It can reduce the overhead of repeated function calls. I find the biggest speedups when it is used with some sort of loop. Then again, many loops can be avoided entirely, which often yields even larger performance gains. > > Thanks, You're welcome. You might also look at the data table package by Matthew Dowle. It does some *very* fast indexing and subsetting and if those operations are serious slow down for you, you would likely benefit substantially from using it. One final comment, since you are creating the matrix of indices; if you can create it in such a way that it already has the vector position rather than row/column form, you could eliminate the need for my f2() function altogether as you could use it to directly index your data, and then just add dimensions back afterward. Cheers, Josh > Ben > > > On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley <jwiley.ps...@gmail.com> > wrote: >> >> Hi Ben, >> >> It seems likely that there are bigger bottle necks in your overall >> program/use---have you tried Rprof() to find where things really get >> slowed down? In any case, f2() below takes about 70% of the time as >> your function in your test data, and 55-65% of the time for a bigger >> example I constructed. Rui's function benefits substantially from >> byte compiling, but is still slower. As a side benefit, f2() seems to >> use less memory than your current implementation. >> >> Cheers, >> >> Josh >> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% >> ######sample data ############## >> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3, >> dimnames = list(c('row1','row2','row3'), c('col1','col2','col3'))) >> >> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3) >> storage.mode(indx) <- "integer" >> >> >> f <- function(x, i, di = dim(i), dx = dim(x)) { >> out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol >> = di[2L], TRUE))] >> dim(out) <- di >> return(out) >> } >> >> >> fun <- function(valdata, inxdata){ >> nr <- nrow(inxdata) >> nc <- ncol(inxdata) >> mat <- matrix(NA, nrow=nr*nc, ncol=2) >> i1 <- 1 >> i2 <- nr >> for(j in 1:nc){ >> mat[i1:i2, 1] <- inxdata[, j] >> mat[i1:i2, 2] <- rep(j, nr) >> i1 <- i1 + nr >> i2 <- i2 + nr >> } >> matrix(valdata[mat], ncol=nc) >> } >> >> require(compiler) >> f2 <- cmpfun(f) >> fun2 <- cmpfun(fun) >> >> system.time(for (i in 1:10000) f(vals, indx)) >> system.time(for (i in 1:10000) f2(vals, indx)) >> system.time(for (i in 1:10000) fun(vals, indx)) >> system.time(for (i in 1:10000) fun2(vals, indx)) >> system.time(for (i in 1:10000) >> >> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))) >> >> ## now let's make a bigger test set >> set.seed(1) >> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4) >> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE)) >> >> dim(vals2) >> dim(indx2) >> >> ## the best contenders from round 1 >> gold <- >> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)) >> test1 <- f2(vals2, indx2) >> all.equal(gold, test1) >> >> system.time(for (i in 1:20) f2(vals2, indx2)) >> system.time(for (i in 1:20) >> >> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))) >> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% >> >> On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccqu...@gmail.com> wrote: >> > Thanks for the info. Unfortunately its a little bit slower after one >> > apples >> > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 >> > seconds. >> > Not a big deal, but significant when I have to do this 300 to 500 times. >> > >> > regards, >> > >> > ben >> > >> > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1...@sapo.pt> wrote: >> > >> >> Hello, >> >> >> >> I don't know if it's the fastest but it's more natural to have an index >> >> matrix with two columns only, >> >> one for each coordinate. And it's fast. >> >> >> >> fun <- function(valdata, inxdata){ >> >> nr <- nrow(inxdata) >> >> nc <- ncol(inxdata) >> >> mat <- matrix(NA, nrow=nr*nc, ncol=2) >> >> i1 <- 1 >> >> i2 <- nr >> >> for(j in 1:nc){ >> >> mat[i1:i2, 1] <- inxdata[, j] >> >> mat[i1:i2, 2] <- rep(j, nr) >> >> i1 <- i1 + nr >> >> i2 <- i2 + nr >> >> } >> >> matrix(valdata[mat], ncol=nc) >> >> } >> >> >> >> fun(vals, indx) >> >> >> >> Rui Barradas >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> >> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ > > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.