Joshua, Just confirming quickly that your method using cmpfun and your f function below was fastest using my real data. Again, thank you for your help!
Ben On Sat, Mar 10, 2012 at 1:21 PM, Joshua Wiley <jwiley.ps...@gmail.com>wrote: > On Sat, Mar 10, 2012 at 12:11 PM, Ben quant <ccqu...@gmail.com> wrote: > > Very interesting. You are doing some stuff here that I have never seen. > > and that I would not typically do or recommend (e.g., fussing with > storage mode or manually setting the dimensions of an object), but > that can be faster by sacrificing higher level functions flexibility > for lower level, more direct control. > > > Thank you. I will test it on my real data on Monday and let you know > what I > > find. That cmpfun function looks very useful! > > It can reduce the overhead of repeated function calls. I find the > biggest speedups when it is used with some sort of loop. Then again, > many loops can be avoided entirely, which often yields even larger > performance gains. > > > > > Thanks, > > You're welcome. You might also look at the data table package by > Matthew Dowle. It does some *very* fast indexing and subsetting and > if those operations are serious slow down for you, you would likely > benefit substantially from using it. One final comment, since you are > creating the matrix of indices; if you can create it in such a way > that it already has the vector position rather than row/column form, > you could eliminate the need for my f2() function altogether as you > could use it to directly index your data, and then just add dimensions > back afterward. > > Cheers, > > Josh > > > Ben > > > > > > On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley <jwiley.ps...@gmail.com> > > wrote: > >> > >> Hi Ben, > >> > >> It seems likely that there are bigger bottle necks in your overall > >> program/use---have you tried Rprof() to find where things really get > >> slowed down? In any case, f2() below takes about 70% of the time as > >> your function in your test data, and 55-65% of the time for a bigger > >> example I constructed. Rui's function benefits substantially from > >> byte compiling, but is still slower. As a side benefit, f2() seems to > >> use less memory than your current implementation. > >> > >> Cheers, > >> > >> Josh > >> > >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > >> ######sample data ############## > >> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3, > >> dimnames = list(c('row1','row2','row3'), c('col1','col2','col3'))) > >> > >> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3) > >> storage.mode(indx) <- "integer" > >> > >> > >> f <- function(x, i, di = dim(i), dx = dim(x)) { > >> out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol > >> = di[2L], TRUE))] > >> dim(out) <- di > >> return(out) > >> } > >> > >> > >> fun <- function(valdata, inxdata){ > >> nr <- nrow(inxdata) > >> nc <- ncol(inxdata) > >> mat <- matrix(NA, nrow=nr*nc, ncol=2) > >> i1 <- 1 > >> i2 <- nr > >> for(j in 1:nc){ > >> mat[i1:i2, 1] <- inxdata[, j] > >> mat[i1:i2, 2] <- rep(j, nr) > >> i1 <- i1 + nr > >> i2 <- i2 + nr > >> } > >> matrix(valdata[mat], ncol=nc) > >> } > >> > >> require(compiler) > >> f2 <- cmpfun(f) > >> fun2 <- cmpfun(fun) > >> > >> system.time(for (i in 1:10000) f(vals, indx)) > >> system.time(for (i in 1:10000) f2(vals, indx)) > >> system.time(for (i in 1:10000) fun(vals, indx)) > >> system.time(for (i in 1:10000) fun2(vals, indx)) > >> system.time(for (i in 1:10000) > >> > >> > matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))) > >> > >> ## now let's make a bigger test set > >> set.seed(1) > >> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4) > >> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, > TRUE)) > >> > >> dim(vals2) > >> dim(indx2) > >> > >> ## the best contenders from round 1 > >> gold <- > >> > matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)) > >> test1 <- f2(vals2, indx2) > >> all.equal(gold, test1) > >> > >> system.time(for (i in 1:20) f2(vals2, indx2)) > >> system.time(for (i in 1:20) > >> > >> > matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))) > >> > >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > >> > >> On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccqu...@gmail.com> wrote: > >> > Thanks for the info. Unfortunately its a little bit slower after one > >> > apples > >> > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 > >> > seconds. > >> > Not a big deal, but significant when I have to do this 300 to 500 > times. > >> > > >> > regards, > >> > > >> > ben > >> > > >> > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1...@sapo.pt> wrote: > >> > > >> >> Hello, > >> >> > >> >> I don't know if it's the fastest but it's more natural to have an > index > >> >> matrix with two columns only, > >> >> one for each coordinate. And it's fast. > >> >> > >> >> fun <- function(valdata, inxdata){ > >> >> nr <- nrow(inxdata) > >> >> nc <- ncol(inxdata) > >> >> mat <- matrix(NA, nrow=nr*nc, ncol=2) > >> >> i1 <- 1 > >> >> i2 <- nr > >> >> for(j in 1:nc){ > >> >> mat[i1:i2, 1] <- inxdata[, j] > >> >> mat[i1:i2, 2] <- rep(j, nr) > >> >> i1 <- i1 + nr > >> >> i2 <- i2 + nr > >> >> } > >> >> matrix(valdata[mat], ncol=nc) > >> >> } > >> >> > >> >> fun(vals, indx) > >> >> > >> >> Rui Barradas > >> >> > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> >> > http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html > >> >> Sent from the R help mailing list archive at Nabble.com. > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Joshua Wiley > >> Ph.D. Student, Health Psychology > >> Programmer Analyst II, Statistical Consulting Group > >> University of California, Los Angeles > >> https://joshuawiley.com/ > > > > > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.