Hello Eric, If you can do a project like this (that manages huge datasets) in SAS, I'd recommend to just do them in SAS rather than use R. I've sadly come to the conclusion that R isn't very good at working with large datasets, and until the powers that be try to do something about to help users like us (e.g., help us get around the damn 2^31-1 limit on vectors), R will remain a great language that is very awkward to use with large datasets. I've used bigmemory and ff - and I have the greatest respect and appreciation for the authors of these packages - but they ultimately are awkward to work with compared to doing things natively in R. For example, there is still a 2^31-1 limit on objects in ff, and bigmemory has been buggy when I tried to use it. Good luck!
JJ On Mon, Feb 22, 2010 at 3:13 PM, Eric Fail <e...@it.dk> wrote: > Dear R-list > > I'm on my way to start a new project on a rather big panel, consisting of > approximately 8 million observations in 30 waves of data and about 15 > variables. I have a similar data set that is approximately 7 gigabytes in > size. > > Until now I have done my data management in SAS, and Stata, mostly > identifying spells, counting events in intervals, and a like, but I would > like to do the data management-and fitting my models-in R. > > Though R can't handle the data in a normal R-way, it's simply too big. So I > thought of trying either filehash, bigmemory or some other similar package I > haven't heard of (yet). In the documentation to 'bigmemory' is says that > the package is capable of ``basic manipulation '' on ``manageable subsets of > the data '', but what does that actually mean? > > Since learning this in R is a rather time consuming process, and I know SAS > is capable of doing the data management, and have the proc mixed module, I > wanted to ask on the list, before I set out on this odyssey. > > Does anyone out there have any practical experience with data sets (panels) > that size and maybe some experience fitting a model, presumably using the > lmer package or alike, using filehash or bigmemory, that they would be > willing to share? > > Thanks in advance, > Eric > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.