On Tue, 10 Nov 2009, maiya wrote:
OK, it's the simple math that's confusing me :)
So you're saying 2.4GB, while windows sees the data as 700KB. Why is that
different?
Your data are stored on disk as a text file (in CSV format, in fact), not as
numbers. This can take up less space.
And lets say I could potentially live with e.g. 1/3 of the cases - that
would make it .8GB, which should be fine? But then my question is if there
is any way to sample the rows in read.table? Or what would be the best way
of importing a random third of my cases?
A better solution is probably to read a subset of the columns at a time. The easiest way
to do this is probably to read the data into a SQLite database with the 'sqldf' package,
but another solution is to use the colClasses= argument to read.table() and specify
"NULL" for the classes of the columns you don't want to read. There are other
ways as well.
It might even be faster to do the cross-tabulations in a database and read the
resulting summaries into R to compute any statistics you need.
Thanks!
M.
jholtman wrote:
A little simple math. You have 3M rows with 100 items on each row.
If read in this would be 300M items. If numeric, 8 bytes/item, this
is 2.4GB. Given that you are probably using a 32 bit version of R,
you are probably out of luck. A rule of thumb is that your largest
object should consume at most 25% of your memory since you will
probably be making copies as part of your processing.
Given that, is you want to read in 100 variables at a time, I would
say your limit would be about 500K rows to be reasonable. So you have
a choice; read in fewer rolls, read in all 3M rows but at 20 columns
per read, put the data in a database and extract what you need.
Unless you go to a 64-bit version of R you will probably not be able
to have the whole file in memory at one time.
On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloz...@gmail.com> wrote:
I'm trying to import a table into R the file is about 700MB. Here's my
first
try:
DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
In addition: Warning messages:
1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
Reached total allocation of 1535Mb: see help(memory.size)
2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
Reached total allocation of 1535Mb: see help(memory.size)
3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
Reached total allocation of 1535Mb: see help(memory.size)
4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
Reached total allocation of 1535Mb: see help(memory.size)
Then I tried
memory.limit(size=4095)
and got
DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 11.3 Mb
but no additional errors. Then optimistically to clear up the workspace:
rm()
DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb?
I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
memory is usually 2Gb. Surely they mean GB?
The file I'm importing has about 3 million cases with 100 variables that
I
want to crosstabulate each with each. Is this completely unrealistic?
Thanks!
Maja
--
View this message in context:
http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
View this message in context:
http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26283467.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.edu University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.