Hi Sean, is this roughly what you are looking for (please note that in the example data you provided there is only one level of ID given, no "S-4", ...) ?
> DF ID Cl Co Brd Ind A AB AB.1 frq 1 S-3 IND A BR_F BR_F01 1 0 0 1.0 2 S-3 IND A BR_F BR_F01 1 0 0 1.0 3 S-3 IND A BR_F BR_F01 1 0 0 1.0 4 S-3 IND A BR_F BR_F01 1 0 0 1.0 5 S-3 IND A BR_F BR_F01 1 0 0 1.0 6 S-3 IND A BR_F BR_F01 0 1 0 0.5 7 S-3 IND A BR_F BR_F02 0 0 1 0.0 8 S-3 IND A BR_F BR_F02 0 1 0 0.5 9 S-3 IND A BR_F BR_F02 1 0 0 1.0 10 S-3 IND A BR_F BR_F02 1 0 0 1.0 11 S-3 IND A BR_F BR_F02 0 1 0 0.5 12 S-3 IND A BR_F BR_F02 1 0 0 1.0 > DF2 <- aggregate(x=DF$frq, by=list(ID=DF$ID, Ind=DF$Ind), FUN=mean) > DF2 ID Ind x 1 S-3 BR_F01 0.9166667 2 S-3 BR_F02 0.6666667 > FinalDF <- tapply(X=DF$frq, INDEX=list(Ind=DF$Ind, ID=DF$ID), FUN=mean) > FinalDF ID Ind S-3 BR_F01 0.9166667 BR_F02 0.6666667 > Best, Roland Sean MacEachern wrote: > Hi All, > > Just hoping some one can give me a hand with a problem... > > I have a dataframe (DF) with about 5 million entries that looks something > like the following: > >> DF > ID Cl Co Brd Ind A AB AB > 1 S-3 IND A BR_F BR_F01 1 0 0 > 2 S-3 IND A BR_F BR_F01 1 0 0 > 3 S-3 IND A BR_F BR_F01 1 0 0 > 4 S-3 IND A BR_F BR_F01 1 0 0 > 5 S-3 IND A BR_F BR_F01 1 0 0 > 6 S-3 IND A BR_F BR_F01 0 1 0 > 7 S-3 IND A BR_F BR_F02 0 0 1 > 8 S-3 IND A BR_F BR_F02 0 1 0 > 9 S-3 IND A BR_F BR_F02 1 0 0 > 10 S-3 IND A BR_F BR_F02 1 0 0 > 11 S-3 IND A BR_F BR_F02 1 0 0 > 12 S-3 IND A BR_F BR_F02 1 0 0 > > I am interested in retrieving the frequency of A for everything with the > same Ind code. > > I have initially created a column called 'frq' that calculates the > individual A frequency > > >> DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0) > >> DF > > ID Cl Co Brd Ind A AB AB frq > 1 S-3 IND A BR_F BR_F01 1 0 0 1 > 2 S-3 IND A BR_F BR_F01 1 0 0 1 > 3 S-3 IND A BR_F BR_F01 1 0 0 1 > 4 S-3 IND A BR_F BR_F01 1 0 0 1 > 5 S-3 IND A BR_F BR_F01 1 0 0 1 > 6 S-3 IND A BR_F BR_F01 0 1 0 0.5 > 7 S-3 IND A BR_F BR_F02 0 0 1 0 > 8 S-3 IND A BR_F BR_F02 0 1 0 0.5 > 9 S-3 IND A BR_F BR_F02 1 0 0 1 > 10 S-3 IND A BR_F BR_F02 1 0 0 1 > 11 S-3 IND A BR_F BR_F02 0 1 0 0.5 > 12 S-3 IND A BR_F BR_F02 1 0 0 1 > > I've created a new DF that contains the info I'm interested in: > >> DF2 = cbind(DF[1],DF[5],DF[9]) > >> DF2 > > ID Ind frq > 1 S-3 BR_F01 1 > 2 S-3 BR_F01 1 > ... > ... > ... > 11 S-3 BR_F02 0.5 > 12 S-3 BR_F02 1 > > > I am wondering is there a method that I can call to calculate the frequency > of A or frq for all individuals with the same Ind code so the DF (matrix) > looks something like the following? (I've saw something in a tut based on > t-tests that I thought would work, but no joy...) > > >> NewDF > > ID Ind frq > 1 S-3 BR_F01 0.9167 > 2 S-3 BR_F02 0.6667 > > > Further, is there to then transform the matrix to look something like the > following? > > >> FinalDF > > Ind S-3 S-4 S-5.... S-1000000 > BR_F01 0.9167 0.5 1 0.6667 > BR_F02 0.6667 0.2 1 0.5 > ... > ... > ... > BR_Z98 0.5 1 0.3 1 > BR_Z99 1 0.6 1 0.5 > > > > Thanks in advance for any help you can offer, and please let me know if > there is any further information I can provide. > > Sean > > >> sessionInfo() > R version 2.6.0 (2007-10-03) > i386-apple-darwin8.10.1 > > locale: > en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.