Check out: http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg79590.html
for sampling a large file. On Tue, Nov 10, 2009 at 8:32 AM, maiya <maja.zaloz...@gmail.com> wrote: > > OK, it's the simple math that's confusing me :) > > So you're saying 2.4GB, while windows sees the data as 700KB. Why is that > different? > > And lets say I could potentially live with e.g. 1/3 of the cases - that > would make it .8GB, which should be fine? But then my question is if there > is any way to sample the rows in read.table? Or what would be the best way > of importing a random third of my cases? > > Thanks! > > M. > > > > jholtman wrote: >> >> A little simple math. You have 3M rows with 100 items on each row. >> If read in this would be 300M items. If numeric, 8 bytes/item, this >> is 2.4GB. Given that you are probably using a 32 bit version of R, >> you are probably out of luck. A rule of thumb is that your largest >> object should consume at most 25% of your memory since you will >> probably be making copies as part of your processing. >> >> Given that, is you want to read in 100 variables at a time, I would >> say your limit would be about 500K rows to be reasonable. So you have >> a choice; read in fewer rolls, read in all 3M rows but at 20 columns >> per read, put the data in a database and extract what you need. >> Unless you go to a 64-bit version of R you will probably not be able >> to have the whole file in memory at one time. >> >> On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloz...@gmail.com> wrote: >>> >>> I'm trying to import a table into R the file is about 700MB. Here's my >>> first >>> try: >>> >>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>> >>> Error: cannot allocate vector of size 15.6 Mb >>> In addition: Warning messages: >>> 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>> : >>> Reached total allocation of 1535Mb: see help(memory.size) >>> 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>> : >>> Reached total allocation of 1535Mb: see help(memory.size) >>> 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>> : >>> Reached total allocation of 1535Mb: see help(memory.size) >>> 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, >>> : >>> Reached total allocation of 1535Mb: see help(memory.size) >>> >>> Then I tried >>> >>>> memory.limit(size=4095) >>> and got >>> >>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>> Error: cannot allocate vector of size 11.3 Mb >>> >>> but no additional errors. Then optimistically to clear up the workspace: >>> >>>> rm() >>>> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) >>> Error: cannot allocate vector of size 15.6 Mb >>> >>> Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? >>> I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable >>> memory is usually 2Gb. Surely they mean GB? >>> >>> The file I'm importing has about 3 million cases with 100 variables that >>> I >>> want to crosstabulate each with each. Is this completely unrealistic? >>> >>> Thanks! >>> >>> Maja >>> -- >>> View this message in context: >>> http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26283467.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.