Thanks Petr and David.

Sorry David if I was not clear enough. The last comment line highlights the
end objective (hopefully more clearly this time).


Petr: I kind of see your line of thought, but still cannot see how it works
on a specific example like this one.



set.seed(1)

dframe <- matrix(runif(250), 50, 5)


### store sort indexes

sort_matrix <- matrix(ncol = ncol(dframe), nrow = nrow(dframe))

for (i in 1:ncol(dframe)) {

  xtemp <- dframe[, i]

  sort_matrix[, i] <- sort.list(xtemp, method = "shell")

}


### take a bootstrap sample

nr_samples <- nrow(dframe)

b.ind <- sample(1:nr_samples, nr_samples*0.5, replace = TRUE)

b.dframe <- dframe[b.ind, ]


### sort bootstrap sample with respect to an arbitrary variable

var1 <- 1 #say var1 for example


### All I need to do is to efficiently re-arrange b.ind according to the
order in sort_matrix[,1] (avoiding sorting again)

b.dframe[, var1][match(sort_matrix[, var1], b.ind)]  #this does not work,
and if it did would be slow


Thanks again,
Axel.

On Fri, May 18, 2012 at 9:50 AM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On May 18, 2012, at 6:37 AM, Axel Urbiz wrote:
>
>  Would I be able to accomplish the same if x.sample was created from x
>> instead of x.sorted. The problem is that in my real problem, I have to
>> sort
>> with respect to many variables and thus keep the sample indexes consistent
>> across variables. So I need to first take the sample and then sort it
>> with respect to potentially any variable.
>>
>
> Either of the strategies I suggested should be generalizable to many sort
> criteria. It should be possible to work with indices that point back to the
> source of the sample. (Feel free to post an example. At the moment your
> requirements seem a bit vague.)
>
> --
> David.
>
>  Thanks again,
>> Axel.
>>
>> On Thu, May 17, 2012 at 1:43 PM, Petr Savicky <savi...@cs.cas.cz> wrote:
>>
>>  On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote:
>>>
>>>> Dear List,
>>>>
>>>> Is there a way I can sort a sample based on a sort index constructed
>>>> from
>>>> the data from which the sample is taken? Basically, I need to take
>>>> 'many'
>>>> samples from the same source data and sort them. This can be very time
>>>> consuming for long vectors. Is there any way I can sort the data only
>>>>
>>> once
>>>
>>>> initially, and use that sort order for the samples?
>>>>
>>>> I believe that idea is what is implemented in tree-based classifiers, so
>>>> the data is sorted only once initially and that sort order is used for
>>>>
>>> the
>>>
>>>> child nodes.
>>>>
>>>>
>>>> set.seed(12345)
>>>> x <- sample(0:100, 10)
>>>> x.order <- order(x)
>>>> x.sorted <- x[x.order]
>>>>
>>>> sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size
>>>>
>>> with replacement
>>>
>>>> x.sample <- x[sample.ind]
>>>>
>>>> x.sample.sorted <-   #??? (without sorting again)
>>>>
>>>
>>> Hi.
>>>
>>> Formally, it is possible to avoid sorting using tabulate() and rep().
>>> However, i am not sure, whether this approach is more efficient.
>>>
>>> set.seed(12345)
>>> x <- sample(0:100, 10)
>>> x.order <- order(x)
>>> x.sorted <- x[x.order]
>>>
>>> sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size
>>> with replacement
>>>
>>>  x.sample <- x.sorted[sample.ind]
>>> freq <- tabulate(sample.ind, nbins=length(x))
>>> x.sample.sorted <- rep(x.sorted, times=freq)
>>>
>>> identical(sort(x.sample), x.sample.sorted) # [1] TRUE
>>>
>>> Note that x.sample is created from x.sorted in order to make x.sample
>>> and x.sample.sorted consistent. Since sample.ind has random order, the
>>> distributions of x[sample.ind] and x.sorted[sample.ind] are the same.
>>>
>>> Computing the frequencies of indices, whose range is known in advance,
>>> can be done in linear time, so theoretically more efficiently than
>>> sorting. However, only a test may determine, what is more efficient in
>>> your situation.
>>>
>>> Hope this helps.
>>>
>>> Petr Savicky.
>>>
>>> ______________________________**________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________**________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to