Thanks Petr and David.
Sorry David if I was not clear enough. The last comment line highlights the end objective (hopefully more clearly this time). Petr: I kind of see your line of thought, but still cannot see how it works on a specific example like this one. set.seed(1) dframe <- matrix(runif(250), 50, 5) ### store sort indexes sort_matrix <- matrix(ncol = ncol(dframe), nrow = nrow(dframe)) for (i in 1:ncol(dframe)) { xtemp <- dframe[, i] sort_matrix[, i] <- sort.list(xtemp, method = "shell") } ### take a bootstrap sample nr_samples <- nrow(dframe) b.ind <- sample(1:nr_samples, nr_samples*0.5, replace = TRUE) b.dframe <- dframe[b.ind, ] ### sort bootstrap sample with respect to an arbitrary variable var1 <- 1 #say var1 for example ### All I need to do is to efficiently re-arrange b.ind according to the order in sort_matrix[,1] (avoiding sorting again) b.dframe[, var1][match(sort_matrix[, var1], b.ind)] #this does not work, and if it did would be slow Thanks again, Axel. On Fri, May 18, 2012 at 9:50 AM, David Winsemius <dwinsem...@comcast.net>wrote: > > On May 18, 2012, at 6:37 AM, Axel Urbiz wrote: > > Would I be able to accomplish the same if x.sample was created from x >> instead of x.sorted. The problem is that in my real problem, I have to >> sort >> with respect to many variables and thus keep the sample indexes consistent >> across variables. So I need to first take the sample and then sort it >> with respect to potentially any variable. >> > > Either of the strategies I suggested should be generalizable to many sort > criteria. It should be possible to work with indices that point back to the > source of the sample. (Feel free to post an example. At the moment your > requirements seem a bit vague.) > > -- > David. > > Thanks again, >> Axel. >> >> On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <savi...@cs.cas.cz> wrote: >> >> On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote: >>> >>>> Dear List, >>>> >>>> Is there a way I can sort a sample based on a sort index constructed >>>> from >>>> the data from which the sample is taken? Basically, I need to take >>>> 'many' >>>> samples from the same source data and sort them. This can be very time >>>> consuming for long vectors. Is there any way I can sort the data only >>>> >>> once >>> >>>> initially, and use that sort order for the samples? >>>> >>>> I believe that idea is what is implemented in tree-based classifiers, so >>>> the data is sorted only once initially and that sort order is used for >>>> >>> the >>> >>>> child nodes. >>>> >>>> >>>> set.seed(12345) >>>> x <- sample(0:100, 10) >>>> x.order <- order(x) >>>> x.sorted <- x[x.order] >>>> >>>> sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size >>>> >>> with replacement >>> >>>> x.sample <- x[sample.ind] >>>> >>>> x.sample.sorted <- #??? (without sorting again) >>>> >>> >>> Hi. >>> >>> Formally, it is possible to avoid sorting using tabulate() and rep(). >>> However, i am not sure, whether this approach is more efficient. >>> >>> set.seed(12345) >>> x <- sample(0:100, 10) >>> x.order <- order(x) >>> x.sorted <- x[x.order] >>> >>> sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size >>> with replacement >>> >>> x.sample <- x.sorted[sample.ind] >>> freq <- tabulate(sample.ind, nbins=length(x)) >>> x.sample.sorted <- rep(x.sorted, times=freq) >>> >>> identical(sort(x.sample), x.sample.sorted) # [1] TRUE >>> >>> Note that x.sample is created from x.sorted in order to make x.sample >>> and x.sample.sorted consistent. Since sample.ind has random order, the >>> distributions of x[sample.ind] and x.sorted[sample.ind] are the same. >>> >>> Computing the frequencies of indices, whose range is known in advance, >>> can be done in linear time, so theoretically more efficiently than >>> sorting. However, only a test may determine, what is more efficient in >>> your situation. >>> >>> Hope this helps. >>> >>> Petr Savicky. >>> >>> ______________________________**________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide >>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.