Re: [R] Counting occurances of a letter by a factor

Brian Diggs Fri, 10 Sep 2010 13:20:43 -0700

On 9/10/2010 12:40 PM, Davis, Brian wrote:

I'm trying to find a more elegant way of doing this.  What I'm trying
to accomplish is to count the frequency of letters (major / minor
alleles) in a string grouped by the factor levels in another column
of my data frame.

Ex.

DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", 
NA))
colnames(DF)<-c("X", "Y")
DF

      X    Y
1   CC    L
2   CC    U
3<NA>     L
4   CG    U
5   GG    L
6   GC<NA>

I have an ugly solution, which works if you know the factor levels of Y in 
advance.

ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),

+ table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))

rownames(ans)<-c("L", "U")
ans

   C G
L 2 2
U 3 1


I've played with table, xtab, tabulate, aggregate, tapply, etc but
haven't found a combination that gives a more general solution to
this problem.

Any ideas?

Brian

You are almost there. The "plyr" package gets you the rest of the way.You already have something that will, for a group of cases with thesame "Y" value, tabulate the "X" values the way you want. ddply willsplit the dataframe up by "Y" values and run that on each part.


library("plyr")

tab <- ddply(DF, .(Y),
function(x) {table(unlist(strsplit(as.character(x$X),"")))})
tab

#     Y C G
#1    L 2 2
#2    U 3 1
#3 <NA> 1 1

It is almost what you asked for. If you really want it as a matrix withnamed rows:


tab2 <- as.matrix(tab[,-1])
rownames(tab2) <- tab[,1]

It still has an entry for the NA value of "Y", but that can be filteredas whatever step you like.


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurances of a letter by a factor

Reply via email to