Hello, I'm working with a 22 GB datasets with ~100 million observations and ~40 variables. It's store in SQLite and I use the RSQLite package to load it into memory. Loading the full population, even for only a few variables, can be very slow and I was wondering if there are best practices for how to manage large datasets when doing analysis in R. Is there an alternative file format / relational datbase in which I should be storing the data?
Best, James -- James F. Mahon III, Ph.D. Candidate Harvard University Tel: (857) 209-8438 Fax: (270) 813-3498 Web: http://www.people.fas.harvard.edu/~jmahon/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.