I'm trying to run "rfe" for variable selection in the caret package, and am getting an error. My data frame includes a dummy variable with 3 levels.
x <- chlDescr y <- chl #crate dummy variable levels(x$State) <- c("AL","GA","FL") dummy <- model.matrix(~State,x) z <- cbind(dummy, x) #remove State category variable w <- z[,c(-4)] subsets <- c(2:8) ctrl<- rfeControl(functions = lmFuncs, method="cv", verbose=FALSE, returnResamp = "final") lmProfile <- rfe(w, y, sizes = subsets, rfeControl = ctrl) Returns: Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected In addition: Warning message: In predict.lm(object, x) : prediction from a rank-deficient fit may be misleading When I remove the dummy variables the function runs fine. #remove State variable Desc <- chlDescr[,-c(1)] lmProfile <- rfe(Desc, y, sizes = subsets, rfeControl = ctrl) Returns: Recursive feature selection Outer resamping method was 10 iterations of cross-validation. Resampling performance over subset size: Variables RMSE Rsquared RMSESD RsquaredSD Selected 1 0.2462 0.7454 0.09529 0.17362 2 0.2408 0.7680 0.07860 0.12543 3 0.2134 0.8285 0.06649 0.09043 4 0.2011 0.8609 0.03463 0.05928 * 5 0.2019 0.8622 0.03421 0.05675 6 0.2019 0.8622 0.03421 0.05675 Can lmFuncs handle dummy variables? How does it need to be modified so it can? I'm new at this so any help would be appreciated, thanks. Reni http://r.789695.n4.nabble.com/file/n3487861/chl.csv chl.csv http://r.789695.n4.nabble.com/file/n3487861/chlDescr.csv chlDescr.csv -- View this message in context: http://r.789695.n4.nabble.com/Dummy-variables-using-rfe-in-caret-for-variable-selection-tp3487861p3487861.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.