Hello, I am attempting to train a random forest model using the randomForest package on 500,000 rows and 8 columns (7 predictors, 1 response). The data set is the first block of data from the UCI Machine Learning Repo dataset "Record Linkage Comparison Patterns" with the slight modification that I dropped two columns with lots of NA's and I used knn imputation to fill in other gaps.
When I load in my dataset, R uses no more than 100 megs of RAM. I'm running a 64-bit R with ~4 gigs of RAM available. When I execute the randomForest() function, however I get memory complaints. Example: > summary(mydata1.clean[,3:10]) cmp_fname_c1 cmp_lname_c1 cmp_sex cmp_bd cmp_bm cmp_by cmp_plz is_match Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000 FALSE:572820 1st Qu.:0.2857 1st Qu.:0.1000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 TRUE : 2093 Median :1.0000 Median :0.1818 Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000 Mean :0.7127 Mean :0.3156 Mean :0.9551 Mean :0.2247 Mean :0.4886 Mean :0.2226 Mean :0.00549 3rd Qu.:1.0000 3rd Qu.:0.4286 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000 > mydata1.rf.model2 <- randomForest(x = > mydata1.clean[,3:9],y=mydata1.clean[,10],ntree=100) Error: cannot allocate vector of size 877.2 Mb In addition: Warning messages: 1: In dim(data) <- dim : Reached total allocation of 3992Mb: see help(memory.size) 2: In dim(data) <- dim : Reached total allocation of 3992Mb: see help(memory.size) 3: In dim(data) <- dim : Reached total allocation of 3992Mb: see help(memory.size) 4: In dim(data) <- dim : Reached total allocation of 3992Mb: see help(memory.size) Other techniques such as boosted trees handle the data size just fine. Are there any parameters I can adjust such that I can use a value of 100 or more for ntree? Thanks, John ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.