Re: [R] index values of one matrix to another of a different size

Ben quant Mon, 12 Mar 2012 13:01:12 -0700

Joshua,

Just confirming quickly that your method using cmpfun and your f function
below was fastest using my real data.  Again, thank you for your help!


Ben

On Sat, Mar 10, 2012 at 1:21 PM, Joshua Wiley <jwiley.ps...@gmail.com>wrote:

> On Sat, Mar 10, 2012 at 12:11 PM, Ben quant <ccqu...@gmail.com> wrote:
> > Very interesting. You are doing some stuff here that I have never seen.
>
> and that I would not typically do or recommend (e.g., fussing with
> storage mode or manually setting the dimensions of an object), but
> that can be faster by sacrificing higher level functions flexibility
> for lower level, more direct control.
>
> > Thank you. I will test it on my real data on Monday and let you know
> what I
> > find. That cmpfun function looks very useful!
>
> It can reduce the overhead of repeated function calls.  I find the
> biggest speedups when it is used with some sort of loop.  Then again,
> many loops can be avoided entirely, which often yields even larger
> performance gains.
>
> >
> > Thanks,
>
> You're welcome.  You might also look at the data table package by
> Matthew Dowle.  It does some *very* fast indexing and subsetting and
> if those operations are serious slow down for you, you would likely
> benefit substantially from using it.  One final comment, since you are
> creating the matrix of indices; if you can create it in such a way
> that it already has the vector position rather than row/column form,
> you could eliminate the need for my f2() function altogether as you
> could use it to directly index your data, and then just add dimensions
> back afterward.
>
> Cheers,
>
> Josh
>
> > Ben
> >
> >
> > On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley <jwiley.ps...@gmail.com>
> > wrote:
> >>
> >> Hi Ben,
> >>
> >> It seems likely that there are bigger bottle necks in your overall
> >> program/use---have you tried Rprof() to find where things really get
> >> slowed down?  In any case, f2() below takes about 70% of the time as
> >> your function in your test data, and 55-65% of the time for a bigger
> >> example I constructed.  Rui's function benefits substantially from
> >> byte compiling, but is still slower.  As a side benefit, f2() seems to
> >> use less memory than your current implementation.
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >> ######sample data ##############
> >> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
> >>  dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))
> >>
> >> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
> >> storage.mode(indx) <- "integer"
> >>
> >>
> >> f <- function(x, i, di = dim(i), dx = dim(x)) {
> >>  out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
> >> = di[2L], TRUE))]
> >>  dim(out) <- di
> >>  return(out)
> >> }
> >>
> >>
> >> fun <- function(valdata, inxdata){
> >>        nr <- nrow(inxdata)
> >>        nc <- ncol(inxdata)
> >>        mat <- matrix(NA, nrow=nr*nc, ncol=2)
> >>        i1 <- 1
> >>        i2 <- nr
> >>        for(j in 1:nc){
> >>                mat[i1:i2, 1] <- inxdata[, j]
> >>                mat[i1:i2, 2] <- rep(j, nr)
> >>                i1 <- i1 + nr
> >>                i2 <- i2 + nr
> >>        }
> >>        matrix(valdata[mat], ncol=nc)
> >> }
> >>
> >> require(compiler)
> >> f2 <- cmpfun(f)
> >> fun2 <- cmpfun(fun)
> >>
> >> system.time(for (i in 1:10000) f(vals, indx))
> >> system.time(for (i in 1:10000) f2(vals, indx))
> >> system.time(for (i in 1:10000) fun(vals, indx))
> >> system.time(for (i in 1:10000) fun2(vals, indx))
> >> system.time(for (i in 1:10000)
> >>
> >>
> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))
> >>
> >> ## now let's make a bigger test set
> >> set.seed(1)
> >> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
> >> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3,
> TRUE))
> >>
> >> dim(vals2)
> >> dim(indx2)
> >>
> >> ## the best contenders from round 1
> >> gold <-
> >>
> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
> >> test1 <- f2(vals2, indx2)
> >> all.equal(gold, test1)
> >>
> >> system.time(for (i in 1:20) f2(vals2, indx2))
> >> system.time(for (i in 1:20)
> >>
> >>
> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)))
> >>
> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >>
> >> On Sat, Mar 10, 2012 at 7:48 AM, Ben quant <ccqu...@gmail.com> wrote:
> >> > Thanks for the info. Unfortunately its a little bit slower after one
> >> > apples
> >> > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73
> >> > seconds.
> >> > Not a big deal, but significant when I have to do this 300 to 500
> times.
> >> >
> >> > regards,
> >> >
> >> > ben
> >> >
> >> > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas <rui1...@sapo.pt> wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> I don't know if it's the fastest but it's more natural to have an
> index
> >> >> matrix with two columns only,
> >> >> one for each coordinate. And it's fast.
> >> >>
> >> >> fun <- function(valdata, inxdata){
> >> >>        nr <- nrow(inxdata)
> >> >>        nc <- ncol(inxdata)
> >> >>        mat <- matrix(NA, nrow=nr*nc, ncol=2)
> >> >>        i1 <- 1
> >> >>        i2 <- nr
> >> >>        for(j in 1:nc){
> >> >>                mat[i1:i2, 1] <- inxdata[, j]
> >> >>                mat[i1:i2, 2] <- rep(j, nr)
> >> >>                i1 <- i1 + nr
> >> >>                i2 <- i2 + nr
> >> >>        }
> >> >>        matrix(valdata[mat], ncol=nc)
> >> >> }
> >> >>
> >> >> fun(vals, indx)
> >> >>
> >> >> Rui Barradas
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >> >>
> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
> >> >> Sent from the R help mailing list archive at Nabble.com.
> >> >>
> >> >> ______________________________________________
> >> >> R-help@r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Joshua Wiley
> >> Ph.D. Student, Health Psychology
> >> Programmer Analyst II, Statistical Consulting Group
> >> University of California, Los Angeles
> >> https://joshuawiley.com/
> >
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] index values of one matrix to another of a different size

Reply via email to