[R] Need Help! Poor performance about randomForest for large data

Jia ZJ Zou Tue, 25 May 2010 02:52:42 -0700

Hi, dears,

I am processing some data with 60 columns, and 286,730 rows.
Most columns are numerical value, and some columns are categorical value.


It turns out that: when ntree sets to the default value (500), it says "can
not allocate a vector of 1.1 GB size"; And when I set ntree to be a very
small number like 10, it will run for hours.
I use the (x,y) rather than the (formula,data).

My code:

> sdata<-read.csv("D://zSignal Dump//XXXX//XXXX.csv")
> sdata1<-subset(sdata,select=-38)
> sdata2<-subset(sdata,select=38)
> res<-randomForest(x=sdata1,y=sdata2,ntrees=10)


Am I doing anything wrong? Or do you have other suggestions? Are there any
other packages to do the same thing?
I will appreciate if anyone can help me out, thanks!


Thanks and Best regards,
------------------------------------------------
Jia, Zou (×Þ¼Î), Ph.D.
IBM Research -- China
Diamond Building, #19 Zhongguancun Software Park, 8 Dongbeiwang West Road,
Haidian District, Beijing 100193, P.R. China
Tel: +86 (10) 58748518
E-mail: jia...@cn.ibm.com
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need Help! Poor performance about randomForest for large data

Reply via email to