Hi Sean,

is this roughly what you are looking for (please note that in the 
example data you provided there is only one level of ID given, no "S-4", 
...) ?

 > DF
     ID  Cl Co  Brd    Ind A AB AB.1 frq
1  S-3 IND  A BR_F BR_F01 1  0    0 1.0
2  S-3 IND  A BR_F BR_F01 1  0    0 1.0
3  S-3 IND  A BR_F BR_F01 1  0    0 1.0
4  S-3 IND  A BR_F BR_F01 1  0    0 1.0
5  S-3 IND  A BR_F BR_F01 1  0    0 1.0
6  S-3 IND  A BR_F BR_F01 0  1    0 0.5
7  S-3 IND  A BR_F BR_F02 0  0    1 0.0
8  S-3 IND  A BR_F BR_F02 0  1    0 0.5
9  S-3 IND  A BR_F BR_F02 1  0    0 1.0
10 S-3 IND  A BR_F BR_F02 1  0    0 1.0
11 S-3 IND  A BR_F BR_F02 0  1    0 0.5
12 S-3 IND  A BR_F BR_F02 1  0    0 1.0
 > DF2 <- aggregate(x=DF$frq, by=list(ID=DF$ID, Ind=DF$Ind), FUN=mean)
 > DF2
    ID    Ind         x
1 S-3 BR_F01 0.9166667
2 S-3 BR_F02 0.6666667
 > FinalDF <- tapply(X=DF$frq, INDEX=list(Ind=DF$Ind, ID=DF$ID), FUN=mean)
 > FinalDF
         ID
Ind            S-3
   BR_F01 0.9166667
   BR_F02 0.6666667
 >

Best,
Roland


Sean MacEachern wrote:
> Hi All,
> 
> Just hoping some one can give me a hand with a problem...
> 
> I have a dataframe (DF) with about 5 million entries that looks something
> like the following:
> 
>> DF
>     ID  Cl Co  Brd    Ind A AB  AB
> 1  S-3 IND  A BR_F BR_F01 1  0   0
> 2  S-3 IND  A BR_F BR_F01 1  0   0
> 3  S-3 IND  A BR_F BR_F01 1  0   0
> 4  S-3 IND  A BR_F BR_F01 1  0   0
> 5  S-3 IND  A BR_F BR_F01 1  0   0
> 6  S-3 IND  A BR_F BR_F01 0  1   0
> 7  S-3 IND  A BR_F BR_F02 0  0   1
> 8  S-3 IND  A BR_F BR_F02 0  1   0
> 9  S-3 IND  A BR_F BR_F02 1  0   0
> 10 S-3 IND  A BR_F BR_F02 1  0   0
> 11 S-3 IND  A BR_F BR_F02 1  0   0
> 12 S-3 IND  A BR_F BR_F02 1  0   0
> 
> I am interested in retrieving the frequency of A for everything with the
> same Ind code.
> 
> I have initially created a column called 'frq' that calculates the
> individual A frequency
> 
> 
>> DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0)
> 
>> DF
> 
>     ID  Cl Co  Brd    Ind A AB  AB  frq
> 1  S-3 IND  A BR_F BR_F01 1  0   0   1
> 2  S-3 IND  A BR_F BR_F01 1  0   0   1
> 3  S-3 IND  A BR_F BR_F01 1  0   0   1
> 4  S-3 IND  A BR_F BR_F01 1  0   0   1
> 5  S-3 IND  A BR_F BR_F01 1  0   0   1
> 6  S-3 IND  A BR_F BR_F01 0  1   0  0.5
> 7  S-3 IND  A BR_F BR_F02 0  0   1   0
> 8  S-3 IND  A BR_F BR_F02 0  1   0  0.5
> 9  S-3 IND  A BR_F BR_F02 1  0   0   1
> 10 S-3 IND  A BR_F BR_F02 1  0   0   1
> 11 S-3 IND  A BR_F BR_F02 0  1   0  0.5
> 12 S-3 IND  A BR_F BR_F02 1  0   0   1
> 
> I've created a new DF that contains the info I'm interested in:
> 
>> DF2 = cbind(DF[1],DF[5],DF[9])
> 
>> DF2
> 
>     ID    Ind frq
> 1  S-3 BR_F01 1 
> 2  S-3 BR_F01 1 
> ...
> ...
> ...
> 11 S-3 BR_F02 0.5
> 12 S-3 BR_F02 1 
> 
> 
> I am wondering is there a method that I can call to calculate the frequency
> of A or frq for all individuals with the same Ind code so the DF (matrix)
> looks something like the following? (I've saw something in a tut based on
> t-tests that I thought would work, but no joy...)
> 
> 
>> NewDF
> 
>     ID    Ind frq
> 1  S-3 BR_F01 0.9167
> 2  S-3 BR_F02 0.6667
>  
> 
> Further, is there to then transform the matrix to look something like the
> following?
> 
> 
>> FinalDF
> 
> Ind       S-3  S-4  S-5.... S-1000000
> BR_F01 0.9167  0.5   1         0.6667
> BR_F02 0.6667  0.2   1         0.5
> ...
> ...
> ...
> BR_Z98   0.5    1   0.3         1
> BR_Z99    1    0.6   1         0.5
> 
> 
> 
> Thanks in advance for any help you can offer, and please let me know if
> there is any further information I can provide.
> 
> Sean
> 
> 
>> sessionInfo()
> R version 2.6.0 (2007-10-03)
> i386-apple-darwin8.10.1
> 
> locale:
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to