On Fri, May 18, 2012 at 09:20:59PM -0400, Axel Urbiz wrote: [...] > Petr: I kind of see your line of thought, but still cannot see how it works > on a specific example like this one.
I did not have email in the last few days. The previous suggestion from https://stat.ethz.ch/pipermail/r-help/2012-May/313197.html was meant for the situation that we want to keep the result of sorting according to several variables, so that later, sorting of a subset can be done only by sorting according to a single variable. Now, i see, all sortings are already according to a single variable, so this is not helpful. Try the following, which uses the example from your code. In particular, it uses a matrix (not a data frame) and there are no duplicates in the data. set.seed(1) dframe <- matrix(runif(250), 50, 5) ### store sort indexes sort_matrix <- matrix(ncol = ncol(dframe), nrow = nrow(dframe)) for (i in 1:ncol(dframe)) { xtemp <- dframe[, i] sort_matrix[, i] <- sort.list(xtemp, method = "shell") } ### take a bootstrap sample nr_samples <- nrow(dframe) b.ind <- sample(1:nr_samples, nr_samples*0.5, replace = TRUE) freq <- tabulate(b.ind, nbins=nr_samples) ### create bootstrap sample sorted with respect to an arbitrary variable var1 <- 1 ind <- sort_matrix[, var1] DF1 <- dframe[ind, ] # this can be computed in advance (before b.ind) NDF1 <- DF1[rep(1:nrow(DF1), times=freq[ind]), ] ### compare with a straightforward method subDF <- dframe[b.ind, ] subDF1 <- subDF[order(subDF[, var1]), ] identical(NDF1, subDF1) [1] TRUE The main step is that "ind" is used to transform both the data and the frequency table. So, they remain consistent and the reordered frequencies may be used for the reordered data. Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.