Re: [R] problem with split eating giga-bytes of memory

2009-12-09 Thread jim holtman
Here is an example: > # create test data > N <- 100 > x <- data.frame(a=sample(LETTERS, N, TRUE), b=sample(letters, N, TRUE), + c=as.numeric(1:N), d=runif(N)) > system.time({ + x.df <- split(x, x$a) # split + print(sapply(x.df, function(a) sum(a$c))) + }) A B

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Jim, could you provide a code snippit to illustrate what you mean? Hadley, good point, I did not know that. Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMai

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread jim holtman
Also instead of 'splitting' the data frame, I split the indices and then use those to access the information in the original dataframe. On Tue, Dec 8, 2009 at 9:54 PM, Mark Kimpel wrote: > Hadley, Just as you were apparently writing I had the same thought and did > exactly what you suggested, co

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Hadley, Just as you were apparently writing I had the same thought and did exactly what you suggested, converting all columns except the one that I want split to character. Executed almost instantaneously without problem. Thanks! Mark Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry I

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread hadley wickham
Hi Mark, Why are you using factors? I think for this case you might find characters are faster and more space efficient. Alternatively, you can have a look at the plyr package which uses some tricks to keep memory usage down. Hadley On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel wrote: > Charles

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Mark Kimpel
Charles, I suspect your are correct regarding copying of the attributes. First off, selectSubAct.df is my "real" data, which turns out to be of the same dim() as myDataFrame below, but each column is make up of strings, not simple letters, and there are many levels in each column, which I did not p

Re: [R] problem with split eating giga-bytes of memory

2009-12-08 Thread Charles C. Berry
On Tue, 8 Dec 2009, Mark Kimpel wrote: I'm having trouble using split on a very large data-set with ~1400 levels of the factor to be split. Unfortunately, I can't reproduce it with the simple self-contained example below. As you can see, splitting the artificial dataframe of size ~13MB results i