Exactly how do you want to work with this data? How do you want it organized? What is the structure of the file that you want to read in? What types of analysis are you going to do? Does all the data have to be in memory at once, or can you construct your analysis to do it in pieces and the aggregate the summary data? There is some missing information before trying to propose a solution.
For example, do you need all the data in memory at one time (if it is all doubles, you would need 800MB for a single copy). Are you running on a 64-bit version of the operating system? If so, I would suggest that you have at least 4GB of real memory for R so that you could have multiple copies that will probably be created by some of the processing. Why are you considering filehash and not a relational database to store/extract the data? You can always read in a portion of the data and then transfer it to the appropriate storage type. No reason for R to "choke" reading in the data if you have structured the input/output files appropriately. On Sun, Jan 2, 2011 at 2:14 PM, michael curran <michcur...@yahoo.com> wrote: > Hi all, > > I am trying to use the filehash library to analyze a 5M by 20 matrix with both > double and string data types. > > > After consulting a few tutorials online, it seems as though one needs to first > read the data into R; then create an R object; and then assign that object a > location in my computer via filehash. It seems like the benefit of this is > minimizing memory allocation when running subsequent analysis (e.g., > descriptive > > statistics, regressions, etc.) . > > > My question is: what happens if R chokes when trying to read in the data > (i.e., > step 1)? Is there another library I can use to get the data read in or, > alternatively, am I misunderstanding the complete functionality of the > filehash > library and what it can do? > > > Apologies if this a basic question--usually I work with considerably smaller > data frames and don't have much experience with memory issues and R. > > > Thanks in advance for any advice/pointers. > > Best, Mike > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.