Here is an example:
> # create test data
> N <- 100
> x <- data.frame(a=sample(LETTERS, N, TRUE), b=sample(letters, N, TRUE),
+ c=as.numeric(1:N), d=runif(N))
> system.time({
+ x.df <- split(x, x$a) # split
+ print(sapply(x.df, function(a) sum(a$c)))
+ })
A B
Jim, could you provide a code snippit to illustrate what you mean?
Hadley, good point, I did not know that.
Mark
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMai
Also instead of 'splitting' the data frame, I split the indices and then use
those to access the information in the original dataframe.
On Tue, Dec 8, 2009 at 9:54 PM, Mark Kimpel wrote:
> Hadley, Just as you were apparently writing I had the same thought and did
> exactly what you suggested, co
Hadley, Just as you were apparently writing I had the same thought and did
exactly what you suggested, converting all columns except the one that I
want split to character. Executed almost instantaneously without problem.
Thanks! Mark
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
I
Hi Mark,
Why are you using factors? I think for this case you might find
characters are faster and more space efficient.
Alternatively, you can have a look at the plyr package which uses some
tricks to keep memory usage down.
Hadley
On Tue, Dec 8, 2009 at 9:46 PM, Mark Kimpel wrote:
> Charles
Charles, I suspect your are correct regarding copying of the attributes.
First off, selectSubAct.df is my "real" data, which turns out to be of the
same dim() as myDataFrame below, but each column is make up of strings, not
simple letters, and there are many levels in each column, which I did not
p
On Tue, 8 Dec 2009, Mark Kimpel wrote:
I'm having trouble using split on a very large data-set with ~1400 levels of
the factor to be split. Unfortunately, I can't reproduce it with the simple
self-contained example below. As you can see, splitting the artificial
dataframe of size ~13MB results i
7 matches
Mail list logo