I don't think this is so hard to explain.  If you evaluate AUC using either OOB 
prediction or on a test set (or something like CV or bootstrap), that would be 
what I expect for most data.  When you add more variables (that are, say, less 
informative) to a model, the model has to look harder to find the informative 
ones, and thus you pay a penalty.  One exception to that is if some of the 
"new" variables happen to have very strong interaction with some of the "old" 
variables, then you may see improved performance.

I've said it several times before, but it seems to be worth repeating:  Don't 
use the training set for evaluating models:  that almost never make sense.

Andy


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of matt
Sent: Friday, May 11, 2012 3:43 PM
To: r-help@r-project.org
Subject: [R] Random forests prediction

Hi all,

I have a strange problem when applying RF in R. 
I have a set of variables with which I obtain an AUC of 0.67.

I do have a second set of variables that have an AUC of 0.57. 

When I merge the first and second set of variables, the AUC becomes 0.64. 

I would expect the prediction to become better as I add variables that do
have some predictive power?
This is even more strange as the AUC on the training set increased when I
added more variables (while the AUC of the validation set thus decreased).

Is there anyone who has experienced the same and/or who know what could be
the reason?

Thanks,

Matthijs

--
View this message in context: 
http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to