Hello,
I do not know whether my package "colbycol" may help you. It can help
you read files that would not have fitted into memory otherwise.
Internally, as the name indicates, data is read into R in a column by
column fashion.
IO times increase but you need just a fraction of "intermediate memo
On Tue, 15 Sep 2009, Evan Klitzke wrote:
On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson
wrote:
As already suggested, you're (much) better off if you specify colClasses, e.g.
tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
Otherwise, R has to load all the
On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson
wrote:
> As already suggested, you're (much) better off if you specify colClasses, e.g.
>
> tab <- read.table("~/20090708.tab", colClasses=c("factor", "double",
> "double"));
>
> Otherwise, R has to load all the data, make a best guess of the co
As already suggested, you're (much) better off if you specify colClasses, e.g.
tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).
/Henrik
On Mon
> its 32-bit representation. This seems like it might be too
> conservative for me, since it implies that R allocated exactly as much
> memory for the lists as there were numbers in the list (e.g. typically
> in an interpreter like this you'd be allocating on order-of-two
> boundaries, i.e. sizeof(
> I think this is just because you picked short strings. If the factor
> is mapping the string to a native integer type, the strings would have
> to be larger for you to notice:
>
>> object.size(sample(c("a pretty long string", "another pretty long string"),
>> 1000, replace=TRUE))
> 8184 bytes
>>
On Mon, Sep 14, 2009 at 8:58 PM, Eduardo Leoni wrote:
> And, by the way, factors take up _more_ memory than character vectors.
>
>> object.size(sample(c("a","b"), 1000, replace=TRUE))
> 4088 bytes
>> object.size(factor(sample(c("a","b"), 1000, replace=TRUE)))
> 4296 bytes
I think this is just bec
On Mon, Sep 14, 2009 at 8:35 PM, jim holtman wrote:
> When you read your file into R, show the structure of the object:
...
Here's the data I get:
> tab <- read.table("~/20090708.tab")
> str(tab)
'data.frame': 1797601 obs. of 3 variables:
$ V1: Factor w/ 6 levels "biz_details",..: 4 4 4 4 4
And, by the way, factors take up _more_ memory than character vectors.
> object.size(sample(c("a","b"), 1000, replace=TRUE))
4088 bytes
> object.size(factor(sample(c("a","b"), 1000, replace=TRUE)))
4296 bytes
On Mon, Sep 14, 2009 at 11:35 PM, jim holtman wrote:
> When you read your file into R,
When you read your file into R, show the structure of the object:
str(tab)
also the size of the object:
object.size(tab)
This will tell you what your data looks like and the size taken in R.
Also in read.table, use colClasses to define what the format of the
data is; may make it faster. You mi
Hello all,
To start with, these measurements are on Linux with R 2.9.2 (64-bit
build) and Python 2.6 (also 64-bit).
I've been investigating R for some log file analysis that I've been
doing. I'm coming at this from the angle of a programmer whose
primarily worked in Python. As I've been playing a
11 matches
Mail list logo