Cool! Thanks for the sampling and ff tips! I think I've figured it out now using sampling...
I'm getting a quad-core, 4GB RAM computer next week, will try it again using a 64 bit version :) Thanks for your time!!! Maja tlumley wrote: > > On Tue, 10 Nov 2009, maiya wrote: > >> >> OK, it's the simple math that's confusing me :) >> >> So you're saying 2.4GB, while windows sees the data as 700KB. Why is that >> different? > > Your data are stored on disk as a text file (in CSV format, in fact), not > as numbers. This can take up less space. > >> And lets say I could potentially live with e.g. 1/3 of the cases - that >> would make it .8GB, which should be fine? But then my question is if >> there >> is any way to sample the rows in read.table? Or what would be the best >> way >> of importing a random third of my cases? > > A better solution is probably to read a subset of the columns at a time. > The easiest way to do this is probably to read the data into a SQLite > database with the 'sqldf' package, but another solution is to use the > colClasses= argument to read.table() and specify "NULL" for the classes of > the columns you don't want to read. There are other ways as well. > > It might even be faster to do the cross-tabulations in a database and read > the resulting summaries into R to compute any statistics you need. > >> Thanks! >> >> M. >> >> >> >> jholtman wrote: >>> >>> A little simple math. You have 3M rows with 100 items on each row. >>> If read in this would be 300M items. If numeric, 8 bytes/item, this >>> is 2.4GB. Given that you are probably using a 32 bit version of R, >>> you are probably out of luck. A rule of thumb is that your largest >>> object should consume at most 25% of your memory since you will >>> probably be making copies as part of your processing. >>> >>> Given that, is you want to read in 100 variables at a time, I would >>> say your limit would be about 500K rows to be reasonable. So you have >>> a choice; read in fewer rolls, read in all 3M rows but at 20 columns >>> per read, put the data in a database and extract what you need. >>> Unless you go to a 64-bit version of R you will probably not be able >>> to have the whole file in memory at one time. >>> >>> On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloz...@gmail.com> wrote: >>>> >>>> I'm trying to import a table into R the file is about 700MB. Here's my >>>> first >>>> try: >>>> >>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>>> >>>> Error: cannot allocate vector of size 15.6 Mb >>>> In addition: Warning messages: >>>> 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>>> : >>>> Reached total allocation of 1535Mb: see help(memory.size) >>>> 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>>> : >>>> Reached total allocation of 1535Mb: see help(memory.size) >>>> 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>>> : >>>> Reached total allocation of 1535Mb: see help(memory.size) >>>> 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>>> : >>>> Reached total allocation of 1535Mb: see help(memory.size) >>>> >>>> Then I tried >>>> >>>>> memory.limit(size=4095) >>>> and got >>>> >>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>>> Error: cannot allocate vector of size 11.3 Mb >>>> >>>> but no additional errors. Then optimistically to clear up the >>>> workspace: >>>> >>>>> rm() >>>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>>> Error: cannot allocate vector of size 15.6 Mb >>>> >>>> Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, >>>> 11.3Mb? >>>> I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable >>>> memory is usually 2Gb. Surely they mean GB? >>>> >>>> The file I'm importing has about 3 million cases with 100 variables >>>> that >>>> I >>>> want to crosstabulate each with each. Is this completely unrealistic? >>>> >>>> Thanks! >>>> >>>> Maja >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26283467.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26291403.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.