Re: [R] Complex sort problem

David Winsemius Fri, 18 May 2012 06:52:29 -0700


On May 18, 2012, at 6:37 AM, Axel Urbiz wrote:

Would I be able to accomplish the same if x.sample was created from x
instead of x.sorted. The problem is that in my real problem, I haveto sortwith respect to many variables and thus keep the sample indexesconsistent
across variables. So I need to first take the sample and then sort it
with respect to potentially any variable.

Either of the strategies I suggested should be generalizable to manysort criteria. It should be possible to work with indices that pointback to the source of the sample. (Feel free to post an example. Atthe moment your requirements seem a bit vague.)


--
David.

Thanks again,
Axel.
On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <[email protected]>wrote:
On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote:
Dear List,
Is there a way I can sort a sample based on a sort indexconstructed fromthe data from which the sample is taken? Basically, I need to take'many'samples from the same source data and sort them. This can be verytimeconsuming for long vectors. Is there any way I can sort the dataonly
once
initially, and use that sort order for the samples?
I believe that idea is what is implemented in tree-basedclassifiers, sothe data is sorted only once initially and that sort order is usedfor
the
child nodes.


set.seed(12345)
x <- sample(0:100, 10)
x.order <- order(x)
x.sorted <- x[x.order]
sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2size
with replacement
x.sample <- x[sample.ind]

x.sample.sorted <-   #??? (without sorting again)
Hi.

Formally, it is possible to avoid sorting using tabulate() and rep().
However, i am not sure, whether this approach is more efficient.

set.seed(12345)
x <- sample(0:100, 10)
x.order <- order(x)
x.sorted <- x[x.order]
sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2size
with replacement

 x.sample <- x.sorted[sample.ind]
freq <- tabulate(sample.ind, nbins=length(x))
x.sample.sorted <- rep(x.sorted, times=freq)

identical(sort(x.sample), x.sample.sorted) # [1] TRUE

Note that x.sample is created from x.sorted in order to make x.sample
and x.sample.sorted consistent. Since sample.ind has random order,the
distributions of x[sample.ind] and x.sorted[sample.ind] are the same.
Computing the frequencies of indices, whose range is known inadvance,
can be done in linear time, so theoretically more efficiently than
sorting. However, only a test may determine, what is more efficientin
your situation.

Hope this helps.

Petr Savicky.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complex sort problem

Reply via email to