Would I be able to accomplish the same if x.sample was created from x instead of x.sorted. The problem is that in my real problem, I have to sort with respect to many variables and thus keep the sample indexes consistent across variables. So I need to first take the sample and then sort it with respect to potentially any variable.
Thanks again, Axel. On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <savi...@cs.cas.cz> wrote: > On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote: > > Dear List, > > > > Is there a way I can sort a sample based on a sort index constructed from > > the data from which the sample is taken? Basically, I need to take 'many' > > samples from the same source data and sort them. This can be very time > > consuming for long vectors. Is there any way I can sort the data only > once > > initially, and use that sort order for the samples? > > > > I believe that idea is what is implemented in tree-based classifiers, so > > the data is sorted only once initially and that sort order is used for > the > > child nodes. > > > > > > set.seed(12345) > > x <- sample(0:100, 10) > > x.order <- order(x) > > x.sorted <- x[x.order] > > > > sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size > with replacement > > x.sample <- x[sample.ind] > > > > x.sample.sorted <- #??? (without sorting again) > > Hi. > > Formally, it is possible to avoid sorting using tabulate() and rep(). > However, i am not sure, whether this approach is more efficient. > > set.seed(12345) > x <- sample(0:100, 10) > x.order <- order(x) > x.sorted <- x[x.order] > > sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size > with replacement > > x.sample <- x.sorted[sample.ind] > freq <- tabulate(sample.ind, nbins=length(x)) > x.sample.sorted <- rep(x.sorted, times=freq) > > identical(sort(x.sample), x.sample.sorted) # [1] TRUE > > Note that x.sample is created from x.sorted in order to make x.sample > and x.sample.sorted consistent. Since sample.ind has random order, the > distributions of x[sample.ind] and x.sorted[sample.ind] are the same. > > Computing the frequencies of indices, whose range is known in advance, > can be done in linear time, so theoretically more efficiently than > sorting. However, only a test may determine, what is more efficient in > your situation. > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.