[R] big panel: filehash, bigmemory or other

Eric Fail Mon, 22 Feb 2010 14:13:47 -0800

Dear R-list

I'm on my way to start a new project on a rather big panel, consistingof approximately 8 million observations in 30 waves of data and about15 variables. I have a similar data set that is approximately 7gigabytes in size.

Until now I have done my data management in SAS, and Stata, mostlyidentifying spells, counting events in intervals, and a like, but Iwould like to do the data management-and fitting my models-in R.

Though R can't handle the data in a normal R-way, it's simply too big.So I thought of trying either filehash, bigmemory or some othersimilar package I haven't heard of (yet). In the documentation to'bigmemory' is says that the package is capable of ``basicmanipulation '' on ``manageable subsets of the data '', but what doesthat actually mean?

Since learning this in R is a rather time consuming process, and Iknow SAS is capable of doing the data management, and have the procmixed module, I wanted to ask on the list, before I set out on thisodyssey.

Does anyone out there have any practical experience with data sets(panels) that size and maybe some experience fitting a model,presumably using the lmer package or alike, using filehash orbigmemory, that they would be willing to share?


Thanks in advance,
Eric

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] big panel: filehash, bigmemory or other

Reply via email to