Dear Jim: Thanks for your reply. Looks to me, you were using batching. I used batching to digest large data in Matlab before. Still wonder the answers to the two specifics questions without resorting to batching.
Thanks. -Sean On Sat, Mar 14, 2009 at 10:13 PM, jim holtman <jholt...@gmail.com> wrote: > Exactly what type of cleaning do you want to do on them? Can you read > in the data a block at a time (e.g., 1M records), clean them up and > then write them back out? You would have the choice of putting them > back as a text file or possibly storing them using 'filehash'. I have > used that technique to segment a year's worth of data that was > probably 3GB of text into monthly objects that were about 70MB > dataframes that I stored using filehash. These I then read back in to > do processing where I could summarize by month. So it all depends on > what you want to do. > > You could read in the chunks, clean them and then reshape them into > dataframes that you could process later. You will still probably have > the problem that all the data still won't fit in memory. Now one > thing I did was that since the dataframes were stored as binary > objects in filehash, it was pretty fast to retrieve them, pick out the > data I needed from each month and create a subset of just the data I > needed that would now fit in memory. > > So it all depends ........... > > On Sat, Mar 14, 2009 at 8:46 PM, Sean Zhang <seane...@gmail.com> wrote: > > Dear R helpers: > > > > I am a newbie to R and have a question related to cleaning large data > frames > > in R. > > > > So far, I have been using SAS for data cleaning because my data sets are > > relatively large (handling multiple files, each could be as large as 5-10 > > G). > > I am not a fan of SAS at all and am eager to move data cleaning tasks > into R > > completely. > > > > Seems to me, there are 3 options. Using SQL, ff or filehash. I do not > want > > to learn sql. so my question is more related to ff and filehash. > > > > In specifics, > > > > (1) for merging two large data frames, which one is better, ff vs. > > filehash? > > (2) for reshaping a large data frame (say from long to wide or the > opposite) > > which one is better, ff vs. filehash? > > > > If you can provide examples, that will be even better. > > > > Many thanks in advance. > > > > -Sean > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.