Re: [Rd] [R] data.frame() size

Matthew Dowle Fri, 09 Dec 2005 09:28:48 -0800

Hi,

Please see below for post on r-help regarding data.frame() and the
possibility of dropping rownames, for space and time reasons.
I've made some changes, attached, and it seems to be working well. I see the
expected space (90% saved) and time (10 times faster) savings. There are no
doubt some bugs, and needs more work and testing, but I thought I would post
first at this stage.


Could some changes along these lines be made to R ? I'm happy to help with
testing and further work if required. In the meantime I can work with
overloaded functions which fixes the problems in my case.

Functions effected :

   dim.data.frame
   format.data.frame
   print.data.frame
   data.frame
   [.data.frame
   as.matrix.data.frame

Modified source code attached.

Regards,
Matthew


-----Original Message-----
From: Matthew Dowle 
Sent: 09 December 2005 09:44
To: 'Peter Dalgaard'
Cc: 'r-help@stat.math.ethz.ch'
Subject: RE: [R] data.frame() size



That explains it. Thanks. I don't need rownames though, as I'll only ever
use integer subscripts. Is there anyway to drop them, or even better not
create them in the first place? The memory saved (90%) by not having them
and 10 times speed up would be very useful. I think I need a data.frame
rather than a matrix because I have columns of different types in real life.

> rownames(d) = NULL
Error in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" : 
        invalid 'dimnames' given for data frame


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter
Dalgaard
Sent: 08 December 2005 18:57
To: Matthew Dowle
Cc: 'r-help@stat.math.ethz.ch'
Subject: Re: [R] data.frame() size


Matthew Dowle <[EMAIL PROTECTED]> writes:

> Hi,
> 
> In the example below why is d 10 times bigger than m, according to
> object.size ? It also takes around 10 times as long to create, which 
> fits with object.size() being truthful.  gcinfo(TRUE) also indicates a 
> great deal more garbage collector activity caused by data.frame() than 
> matrix().
> 
> $ R --vanilla
> ....
> > nr = 1000000
> > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> [1] 0.22 0.01 0.23 0.00 0.00
> > system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> [1] 2.81 0.20 3.01 0.00 0.00                  # 10 times longer
> 
> > dim(m)
> [1] 1000000       2
> > dim(d)
> [1] 1000000       2                           # same dimensions
> 
> > storage.mode(m)
> [1] "integer"
> > sapply(d, storage.mode)
>         a         b 
> "integer" "integer"                           # same storage.mode
> 
> > object.size(m)/1024^2
> [1] 7.629616
> > object.size(d)/1024^2
> [1] 76.29482                                  # but 10 times bigger
> 
> > sum(sapply(d, object.size))/1024^2
> [1] 7.629501                                  # or is it ?    If its not
> really 10 times bigger, why 10 times longer above ?

Row names!!


> r <- as.character(1:1e6)
> object.size(r)
[1] 72000056
> object.size(r)/1024^2
[1] 68.6646

'nuff said?

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED])                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] data.frame() size

Reply via email to