Dear all,

using an existing random forest, I would like to calculate the proximity
for a new test object, i.e. the similarity between the new object and the
old training objects which were used for building the random forest. I do
not want to build a new random forest based on both old and new objects.

Currently, my workaround is to calculate the proximites of a combined data
set consisting of training and new objects like this:

model <- randomForest(Xtrain, Ytrain) # build random forest
nnew <- nrow(Xnew) # number of new objects
Xcombi <- rbind(Xnew, Xtrain) # combine new objects and training objects
predcombi <- predict(model, Xcombi, proximity=TRUE) # calculate proximities
proxcombi <- predcombi$proximity # get proximities of combined dataset
proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of new objects
only

But this approach causes a lot of wasted computation time as I am not
interested in the proximities among the training objects themselves but
only among the training objects and the new objects. With 1000 training
objects and 5 new objects, I have to calculate a 1005x1005 proximity matrix
to get the essential 5x1000 matrix of the new objects only.

Am I doing something wrong? I read through the documentation but could not
find another solution. Any advice would be highly appreciated.

Thanks in advance!
Kilian

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to