Hi,

Can someone please offer me some guidance?

I imported some data. One of the columns called "JOBTITLE" when imported was 
imported as a factor column with 416 levels.

I subset the data in such a way that only 4 levels have data in "JOBTITLE" and 
tried running randomForest but it complained about "JOBTITLE" having more than 
32 categories. I know that is the limit in randomForest but I guess I don't 
understand enough about factors because I thought by subsetting the data this 
no longer would be an issue. BTW I can run randomForest on this dataset if I 
exclude "JOBTITLE".

So  I then converted that column to a character vector:
> TRAINSET$JOBTITLE<-as.character(TRAINSET$JOBTITLE)

I ran Random Forest and got the below error. Why isn't this working? What do I 
need to do to get this working?

> library(randomForest)
> FOREST_model <- randomForest(as.factor(TARGET)~., data=trainset, mtry=4, 
> ntree=1000,
+                            importance=TRUE, do.trace=100)

Error in randomForest.default(m, y, ...) :
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In data.matrix(x) : NAs introduced by coercion

Your help will be greatly appreciated.

Dan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to