Hi all, I've been using the randomForest package on a dataset (described later) and my problem is: even though I specify proximity= TRUE in the call I get a NULL proximity matrix. Any thoughts on why that may happen?
Unfortunately I can't post my dataset, which is particularly problematic here since i believe that's where the problem is. So I'll try to give as detailed of an account as i can. The outcome is binary, highly skewed with the positive outcome being 1.5% of the data. The dataset has ~7000 observations and 200 predictors. these are either 2 level factors or continuous variables. Extremely sparse. Here is my call: #i pass a balanced dataset for each tree, to deal with the skewed outcome. rf<-randomForest(y~. ,data=train, ntree=800,replace=TRUE,sampsize = c(112, 112), proximilty=TRUE) Any ideas on why im getting a null proximity measure/ solutions? Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.