Hi all,
I've been using the randomForest package on a dataset (described later) and
my problem is: even though I specify proximity= TRUE in the call I get a
NULL proximity matrix. Any thoughts on why that may happen?
Unfortunately I can't post my dataset, which is particularly problematic
here since i believe that's where the problem is. So I'll try to give as
detailed of an account as i can.
The outcome is binary, highly skewed with the positive outcome being 1.5%
of the data.
The dataset has ~7000 observations and 200 predictors. these are either 2
level factors or continuous variables. Extremely sparse.
Here is my call:
#i pass a balanced dataset for each tree, to deal with the skewed outcome.
rf<-randomForest(y~. ,data=train, ntree=800,replace=TRUE,sampsize = c(112,
112), proximilty=TRUE)
Any ideas on why im getting a null proximity measure/ solutions?
Thanks!
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.