"%in%" is a wrapper for match(), which uses hashing I believe (correction welcome!),and so is generally very fast.
See ?Rprof for profiling R code to get timings. (Haven't used it myself, so not sure how useful it would be for this situation). -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Feb 24, 2016 at 8:45 AM, Axel Urbiz <axel.ur...@gmail.com> wrote: > Hello, > > As decision trees require sorting the variable used for splitting a given > node, I'm trying to avoid having this recurrent sorting by only sorting all > numeric variable first (and only once). > > My attempt in doing this is shown in "Solution 2" below, but although I get > the desired result I think the %in% operation may be a costly one (and may > even offset the benefits of pre-sorting). > > Any alternative solutions would be highly appreciated. > > ### Sample data > > set.seed(1) > df <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) > w <- rep(1L, nrow(df)) > # w == 1L denote observation present in the current node > w[c(1, 8, 10)] <- 0L > > ### The problem: sort x1 within observations present in the current node > > ### Solution 1: slow for repeated sorting > nodeObsInd <- which(w == 1L) > sol1 <- df[nodeObsInd, ] > sol1 <- sol1[order(sol1$x1), ]$x1 > > ### Solution 2: sort all variables initially only. > > sort_fun <- function(x) > { > index <- order(x) > x <- x[index] > data.frame(x, index) # the index gives original position of the obs > } > > s_df <- lapply(df, function(x) sort_fun(x)) > sol2 <- s_df[[1]][s_df$x1$index %in% nodeObsInd, ] > > ### check same result > > all.equal(sol1, sol2$x) > > > > > Regards, > Axel. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.