Hello, As decision trees require sorting the variable used for splitting a given node, I'm trying to avoid having this recurrent sorting by only sorting all numeric variable first (and only once).
My attempt in doing this is shown in "Solution 2" below, but although I get the desired result I think the %in% operation may be a costly one (and may even offset the benefits of pre-sorting). Any alternative solutions would be highly appreciated. ### Sample data set.seed(1) df <- data.frame(x1 = rnorm(20), x2 = rnorm(20)) w <- rep(1L, nrow(df)) # w == 1L denote observation present in the current node w[c(1, 8, 10)] <- 0L ### The problem: sort x1 within observations present in the current node ### Solution 1: slow for repeated sorting nodeObsInd <- which(w == 1L) sol1 <- df[nodeObsInd, ] sol1 <- sol1[order(sol1$x1), ]$x1 ### Solution 2: sort all variables initially only. sort_fun <- function(x) { index <- order(x) x <- x[index] data.frame(x, index) # the index gives original position of the obs } s_df <- lapply(df, function(x) sort_fun(x)) sol2 <- s_df[[1]][s_df$x1$index %in% nodeObsInd, ] ### check same result all.equal(sol1, sol2$x) Regards, Axel. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.