On May 18, 2012, at 6:37 AM, Axel Urbiz wrote:
Would I be able to accomplish the same if x.sample was created from x
instead of x.sorted. The problem is that in my real problem, I have
to sort
with respect to many variables and thus keep the sample indexes
consistent
across variables. So I need to first take the sample and then sort it
with respect to potentially any variable.
Either of the strategies I suggested should be generalizable to many
sort criteria. It should be possible to work with indices that point
back to the source of the sample. (Feel free to post an example. At
the moment your requirements seem a bit vague.)
--
David.
Thanks again,
Axel.
On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <savi...@cs.cas.cz>
wrote:
On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote:
Dear List,
Is there a way I can sort a sample based on a sort index
constructed from
the data from which the sample is taken? Basically, I need to take
'many'
samples from the same source data and sort them. This can be very
time
consuming for long vectors. Is there any way I can sort the data
only
once
initially, and use that sort order for the samples?
I believe that idea is what is implemented in tree-based
classifiers, so
the data is sorted only once initially and that sort order is used
for
the
child nodes.
set.seed(12345)
x <- sample(0:100, 10)
x.order <- order(x)
x.sorted <- x[x.order]
sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2
size
with replacement
x.sample <- x[sample.ind]
x.sample.sorted <- #??? (without sorting again)
Hi.
Formally, it is possible to avoid sorting using tabulate() and rep().
However, i am not sure, whether this approach is more efficient.
set.seed(12345)
x <- sample(0:100, 10)
x.order <- order(x)
x.sorted <- x[x.order]
sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2
size
with replacement
x.sample <- x.sorted[sample.ind]
freq <- tabulate(sample.ind, nbins=length(x))
x.sample.sorted <- rep(x.sorted, times=freq)
identical(sort(x.sample), x.sample.sorted) # [1] TRUE
Note that x.sample is created from x.sorted in order to make x.sample
and x.sample.sorted consistent. Since sample.ind has random order,
the
distributions of x[sample.ind] and x.sorted[sample.ind] are the same.
Computing the frequencies of indices, whose range is known in
advance,
can be done in linear time, so theoretically more efficiently than
sorting. However, only a test may determine, what is more efficient
in
your situation.
Hope this helps.
Petr Savicky.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.