Would I be able to accomplish the same if x.sample was created from x
instead of x.sorted. The problem is that in my real problem, I have to sort
with respect to many variables and thus keep the sample indexes consistent
across variables. So I need to first take the sample and then sort it
with respect to potentially any variable.

Thanks again,
Axel.

On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <savi...@cs.cas.cz> wrote:

> On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote:
> > Dear List,
> >
> > Is there a way I can sort a sample based on a sort index constructed from
> > the data from which the sample is taken? Basically, I need to take 'many'
> > samples from the same source data and sort them. This can be very time
> > consuming for long vectors. Is there any way I can sort the data only
> once
> > initially, and use that sort order for the samples?
> >
> > I believe that idea is what is implemented in tree-based classifiers, so
> > the data is sorted only once initially and that sort order is used for
> the
> > child nodes.
> >
> >
> > set.seed(12345)
> > x <- sample(0:100, 10)
> > x.order <- order(x)
> > x.sorted <- x[x.order]
> >
> > sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size
> with replacement
> > x.sample <- x[sample.ind]
> >
> > x.sample.sorted <-   #??? (without sorting again)
>
> Hi.
>
> Formally, it is possible to avoid sorting using tabulate() and rep().
> However, i am not sure, whether this approach is more efficient.
>
>  set.seed(12345)
>  x <- sample(0:100, 10)
>  x.order <- order(x)
>  x.sorted <- x[x.order]
>
>  sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size
> with replacement
>
>   x.sample <- x.sorted[sample.ind]
>  freq <- tabulate(sample.ind, nbins=length(x))
>  x.sample.sorted <- rep(x.sorted, times=freq)
>
>  identical(sort(x.sample), x.sample.sorted) # [1] TRUE
>
> Note that x.sample is created from x.sorted in order to make x.sample
> and x.sample.sorted consistent. Since sample.ind has random order, the
> distributions of x[sample.ind] and x.sorted[sample.ind] are the same.
>
> Computing the frequencies of indices, whose range is known in advance,
> can be done in linear time, so theoretically more efficiently than
> sorting. However, only a test may determine, what is more efficient in
> your situation.
>
> Hope this helps.
>
> Petr Savicky.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to