Dear Dennis and dear All, It was probably not my best post. I am running R on a Debian box (amd64 architecture) and that is why I was surprised to see memory issues when dealing with a vector larger than 1Gb. The memory is there, but probably it is not contiguous. I will investigate into the matter and post again (generating an artificial dataframe if needed). Many thanks
Lorenzo On 4 February 2013 00:50, Dennis Murphy <djmu...@gmail.com> wrote: > Hi Lorenzo: > > On Sun, Feb 3, 2013 at 11:47 AM, Lorenzo Isella > <lorenzo.ise...@gmail.com> wrote: >> Dear All, >> For a data mining project, I am relying heavily on the RandomForest and >> Party packages. >> Due to the large size of the data set, I have often memory problems (in >> particular with the Party package; RandomForest seems to use less memory). I >> really have two questions at this point >> 1) Please see how I am using the Party and RandomForest packages. Any >> comment is welcome and useful. > > As noted elsewhere, the example is not reproducible so I can't help you there. >> >> >> >> myparty <- cforest(SalePrice ~ ModelID+ >> ProductGroup+ >> ProductGroupDesc+MfgYear+saledate3+saleday+ >> salemonth, >> data = trainRF, >> control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE)) >> >> >> >> >> rf_model <- randomForest(SalePrice ~ ModelID+ >> ProductGroup+ >> ProductGroupDesc+MfgYear+saledate3+saleday+ >> salemonth, >> data = trainRF,na.action = na.omit, >> importance=TRUE, do.trace=100, mtry=3,ntree=300) >> >> 2) I have another question: sometimes R crashes after telling me that it is >> unable to allocate e.g. an array of 1.5 Gb. >> However, I have 4Gb of ram on my box, so...technically the memory is there, >> but is there a way to enable R to use more of it? > > 4Gb is not a lot of RAM for data mining projects. I have twice that > and run into memory limits on some fairly simple tasks (e.g., 2D > tables) in large simulations with 1M or 10M runs. Part of the problem > is that data is often copied, sometimes more than once. If you have a > 1Gb input data frame, three copies and you're out of space. Moreover, > copied objects need contiguous memory, and this becomes very difficult > to achieve with large objects and limited RAM. With 4Gb RAM, you need > to be more clever: > > * eliminate as many other processes that access RAM as possible (e.g., > no active browser) > * think of ways to process your data in chunks (which is harder to do > when the objective is model fitting) > * type ?"Memory-limits" (including the quotes) at the console for > explanations about memory limits and a few places to look for > potential solutions > * look into 'big data' packages like ff or bigmemory, among others > * if you're in an (American ?) academic institution, you can get a > free license for Revolution R, which is supposed to be better for big > data problems than vanilla R > > It's hard to be specific about potential solutions, but the above > should broaden your perspective on the big data problem and possible > avenues for solving it. > > Dennis >> >> Many thanks >> >> Lorenzo >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.