Re: [R] avoiding too many loops - reshaping data

Bert Gunter Wed, 03 Nov 2010 21:52:55 -0700

Beware of facile comparisons of this sort -- they may be apples and nematodes.


I cannot speak to the others, but (1) tapply does not yield a data
frame and (2) tapply actually **is** a (efficient, disguised) loop (at
the interpreter level, essentially). I suspect what makes it so much
faster is that it avoids the overhead of setting up careful data
structures that the others provide and (2) the underlying summarizing
function is sum(), which does its work at the c, not the interpreted
level. If it were a user function -- maybe mysum <- function(x)sum(x)
-- I suspect the discrepancy might not be so large (try it!)

Naturally, I am prepared to be instructed and corrected on this either
by you or someone wiser on these matters.

-- Bert

On Wed, Nov 3, 2010 at 3:16 PM, Dimitri Liakhovitski
<dimitri.liakhovit...@gmail.com> wrote:
> Here is the summary of methods. tapply is the fastest!
>
> library(reshape)
>
> system.time(for(i in 1:1000)cast(melt(mydf, measure.vars = "value"),
> city ~ brand,fun.aggregate = sum))
>  user  system elapsed
>
>  18.40    0.00   18.44
>
> library(reshape2)
> system.time(for(i in 1:1000)dcast(mydf,city ~ brand, sum))
>  user  system elapsed
>  12.36    0.02   12.37
>
>
> system.time(for(i in 1:1000)xtabs(value ~ city + brand, mydf))
>
>  user  system elapsed
>
>  2.45    0.00    2.47
>
>
> system.time(for(i in 1:1000)tapply(mydf$value,mydf[c('city','brand')],sum))
>
>  user  system elapsed
>
>  0.78    0.00    0.79
>
> Dimitri
>
>
> On Wed, Nov 3, 2010 at 4:32 PM, Henrique Dallazuanna <www...@gmail.com> wrote:
>> Try this:
>>
>>  xtabs(value ~ city + brand, mydf)
>>
>> On Wed, Nov 3, 2010 at 6:23 PM, Dimitri Liakhovitski
>> <dimitri.liakhovit...@gmail.com> wrote:
>>>
>>> Hello!
>>>
>>> I have a data frame like this one:
>>>
>>>
>>> mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"),
>>>  brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"),
>>>  value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116))
>>> (mydf)
>>>
>>> What I need to get is a data frame like the one below - cities as
>>> rows, brands as columns, and the sums of the "value" within each
>>> city/brand combination in the body of the data frame:
>>>
>>> city x   y    z
>>> a    3   23  336
>>> b    7   42  231
>>>
>>>
>>> I have written a code that involves multiple loops and subindexing -
>>> but it's taking too long.
>>> I am sure there must be a more efficient way of doing it.
>>>
>>> Thanks a lot for your hints!
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> Ninah Consulting
>>> www.ninah.com
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] avoiding too many loops - reshaping data

Reply via email to