Have you tried adjusting: mtry - the number of parameters to try per tree ntree - the number of trees grown keep.forest - logical on whether to store tree
Specifically, I found huge improvements in speed by switching keep.forest to FALSE in the past when I didn't actually need the forest post analysis. -------------------------------------- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-boun...@r-project.org wrote on 01/03/2011 02:59:29 PM: > [image removed] > > [R] randomForest speed improvements > > apresley > > to: > > r-help > > 01/03/2011 03:03 PM > > Sent by: > > r-help-boun...@r-project.org > > > Hi there, > > We're trying to use randomForest to do some predictions. The test-harness > for our code is pretty straightforward: > > library ('randomForest'); > data202 <- read.csv ("random.csv", header=TRUE); > x<- data202[1:50000,1:6]; > y<- data202[1:50000,8]; > y<- y[,drop=TRUE]; > > x2 <- data202[50001:60000,1:6]; > y2 <- data202[50001:60000,8]; > y2 <- y2[,drop=TRUE]; > > RFobject <- randomForest(x,y,na.action=na.roughfix); > p <- predict (RFobject, x2); > > In this case, the CSV contains 10 columns, of which 1-6 are numeric in > nature (day of week, week of month, etc...) and column 8 is the target > (sales, a numeric number). > > randomForest does fine with the data, our issue is how long it takes. In > this case, about 5,000 rows of data seems to take just a few seconds, but > going to 50,000 rows doesn't take 5x the time, it takes perhaps 30 or 40 > minutes. > > We've downloaded and tried RT-Rank, which is a multi-threaded version of > RandomForest, and this seems to produce the same (or slightly better) > predictions, but also gets done fairly quickly. > > What can we do to improve the speed of this data computation? The system > we're on is a dual quad-core Intel CPU @ 2.33Ghz, and with 16GB of RAM ... > we're using the "stock" R RPM for CentOS 5.5. > > Thanks! > > -- > Anthony > -- > View this message in context: http://r.789695.n4.nabble.com/ > randomForest-speed-improvements-tp3172523p3172523.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.