Dear all, using an existing random forest, I would like to calculate the proximity for a new test object, i.e. the similarity between the new object and the old training objects which were used for building the random forest. I do not want to build a new random forest based on both old and new objects.
Currently, my workaround is to calculate the proximites of a combined data set consisting of training and new objects like this: model <- randomForest(Xtrain, Ytrain) # build random forest nnew <- nrow(Xnew) # number of new objects Xcombi <- rbind(Xnew, Xtrain) # combine new objects and training objects predcombi <- predict(model, Xcombi, proximity=TRUE) # calculate proximities proxcombi <- predcombi$proximity # get proximities of combined dataset proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of new objects only But this approach causes a lot of wasted computation time as I am not interested in the proximities among the training objects themselves but only among the training objects and the new objects. With 1000 training objects and 5 new objects, I have to calculate a 1005x1005 proximity matrix to get the essential 5x1000 matrix of the new objects only. Am I doing something wrong? I read through the documentation but could not find another solution. Any advice would be highly appreciated. Thanks in advance! Kilian [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.