Hi, Can someone please offer me some guidance?
I imported some data. One of the columns called "JOBTITLE" when imported was imported as a factor column with 416 levels. I subset the data in such a way that only 4 levels have data in "JOBTITLE" and tried running randomForest but it complained about "JOBTITLE" having more than 32 categories. I know that is the limit in randomForest but I guess I don't understand enough about factors because I thought by subsetting the data this no longer would be an issue. BTW I can run randomForest on this dataset if I exclude "JOBTITLE". So I then converted that column to a character vector: > TRAINSET$JOBTITLE<-as.character(TRAINSET$JOBTITLE) I ran Random Forest and got the below error. Why isn't this working? What do I need to do to get this working? > library(randomForest) > FOREST_model <- randomForest(as.factor(TARGET)~., data=trainset, mtry=4, > ntree=1000, + importance=TRUE, do.trace=100) Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In data.matrix(x) : NAs introduced by coercion Your help will be greatly appreciated. Dan [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.