I'm my quest for brevity I think I scarified too much clarity. I'll try to be a little less brief in the hopes of being more clear.
Say I have data frame like this as before: > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", > "L", NA)) > colnames(DF)<-c("X", "Y") > DF X Y 1 CC L 2 CC U 3 <NA> L 4 CG U 5 GG L 6 GC <NA> I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y So for DF$Y == "L" there are 2 "C"'s and 2 "G"'s and for DF$Y == "U" there are 3 "C"'s and 1 "G" The NA's should not contribute to the counts. If I had a individual character in DF$X instead of a string like: > DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", > NA)) > colnames(DF2)<-c("X", "Y") > DF2 X Y 1 C L 2 C U 3 <NA> L 4 C U 5 G L 6 G <NA> Then table gives me exactly what I need. > table(DF2) Y X L U C 1 2 G 1 0 Hopefully this is a little bit clearer what I'm trying to accomplish. Brian -----Original Message----- From: Phil Spector [mailto:spec...@stat.berkeley.edu] Sent: Friday, September 10, 2010 2:52 PM To: Davis, Brian Subject: Re: [R] Counting occurances of a letter by a factor Brian - Here's the only thing I can come up with to give the same result as your "ans", but it doesn't seem to correspond with your description of the problem. > DF1 = DF > DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1) > DF2 = DF > DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2) > newDF = rbind(DF1,DF2) > table(newDF$Y,newDF$X) C G L 2 2 U 3 1 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 10 Sep 2010, Davis, Brian wrote: > I'm trying to find a more elegant way of doing this. What I'm trying to > accomplish is to count the frequency of letters (major / minor alleles) in > a string grouped by the factor levels in another column of my data frame. > > Ex. >> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", >> "L", NA)) >> colnames(DF)<-c("X", "Y") >> DF > X Y > 1 CC L > 2 CC U > 3 <NA> L > 4 CG U > 5 GG L > 6 GC <NA> > > I have an ugly solution, which works if you know the factor levels of Y in > advance. > >> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), >> ""))), > + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), "")))) >> rownames(ans)<-c("L", "U") >> ans > C G > L 2 2 > U 3 1 > > > I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't > found a combination that gives a more general solution to this problem. > > Any ideas? > > Brian > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.