Re: [Rd] Suggestion to extend aggregate() to return multiple and/or named values

Mike Lawrence Fri, 13 Jul 2007 10:46:50 -0700

bugfix already :P prior version fails when there is only one factor  
in Ind. This version also might be faster as I avoid using aggregate  
to create the dummy frame.


agg=function(z,Ind,FUN,...){
        FUN.out=by(z,Ind,FUN,...)
        num.cells=length(FUN.out)
        num.values=length(FUN.out[[1]])
        
        for(i in 1:length(Ind)){
                Ind[[i]]=unique(Ind[[i]])
        }
        temp=expand.grid(Ind)

        for(i in 1:num.values){
                temp$new=NA
                n=names(FUN.out[[1]])[i]
                
names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste 
('x',i,sep='')))
                for(j in 1:num.cells){
                        temp[j,length(temp)]=FUN.out[[j]][i]
                }
        }
        return(temp)
}


On 13-Jul-07, at 1:29 PM, Mike Lawrence wrote:

> Hi all,
>
> This is my first post to the developers list. As I understand it,  
> aggregate() currently repeats a function across cells in a  
> dataframe but is only able to handle functions with single value  
> returns. Aggregate() also lacks the ability to retain the names  
> given to the returned value. I've created an agg() function (pasted  
> below) that is apparently backwards compatible (i.e. returns  
> identical results as aggregate() if the function returns a single  
> unnamed value), but is able to handle named and/or multiple return  
> values. The code may be a little inefficient (there must be an  
> easier way to set up the 'temp' data frame than to call aggregate  
> and remove the final column), but I'm suggesting that something  
> similar to this may be profitably used to replace aggregate entirely.
>
> #modified aggregate command, allowing for multiple/named output values
> agg=function(z,Ind,FUN,...){
>       FUN.out=by(z,Ind,FUN,...)
>       num.cells=length(FUN.out)
>       num.dv=length(FUN.out[[1]])
>       
>       temp=aggregate(z,Ind,length) #dummy data frame
>       temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy  
> frame
>               
>       for(i in 1:num.dv){
>               temp=cbind(temp,NA)
>               n=names(FUN.out[[1]])[i]
>               names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse 
> (i==1,'x',paste('x',i,sep='')))
>               for(j in 1:num.cells){
>                       temp[j,length(temp)]=FUN.out[[j]][i]
>               }
>       }
>       return(temp)
> }
>
> #create some factored data
> z=rnorm(100) # the DV
> A=rep(1:2,each=25,2) #one factor
> B=rep(1:2,each=50) #another factor
> Ind=list(A=A,B=B) #the factor list
>
> aggregate(z,Ind,mean) #show the means of each cell
> agg(z,Ind,mean) #should be identical to aggregate
>
> aggregate(z,Ind,summary) #returns an error
> agg(z,Ind,summary) #returns named columns
>
> #Make a function that returns multiple unnamed values
> summary2=function(x){
>       s=summary(x)
>       names(s)=NULL
>       return(s)
> }
> agg(z,Ind,summary2) #returns multiple columns, default names
>
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>       - Piet Hein
>
>

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

"The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less."
        - Piet Hein

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggestion to extend aggregate() to return multiple and/or named values

Reply via email to