You have made a good first start by keeping your data in a database (it would be even slower if you read it in from a text file each time).
The first suggestion is to not read in all the data, just bring in what you need. For early steps, exploring the data, getting a feel for what you want to do, basic plots, etc. you may want to work with just a sample of your data that will work quickly and easily, then later you can have a script load the full data and analyze it based on what you learned from the sample. You can also have the database calculate (often quicker) some of the summary statistics instead of bringing in the data to R. The ff package has tools for storing large datasets on the disk with just pointers in memory, then it will load in just those pieces that you need so just parts of the data are in memory at any given time. Also the biglm package has tools for working with just parts of the data at a time. Some of the tools for parallel processing can work well with large datasets, the High Performance Computing Task View would be good for you to skim through to see if any of those tools look useful to you. On Wed, Jan 8, 2014 at 2:23 PM, James Mahon <james.maho...@gmail.com> wrote: > Hello, > > I'm working with a 22 GB datasets with ~100 million observations and ~40 > variables. It's store in SQLite and I use the RSQLite package to load it > into memory. Loading the full population, even for only a few variables, > can be very slow and I was wondering if there are best practices for how to > manage large datasets when doing analysis in R. Is there an alternative > file format / relational datbase in which I should be storing the data? > > Best, > > James > -- > James F. Mahon III, Ph.D. Candidate > Harvard University > Tel: (857) 209-8438 > Fax: (270) 813-3498 > Web: http://www.people.fas.harvard.edu/~jmahon/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.