Hi, On Wed, Jun 15, 2011 at 4:37 PM, karena <dr.jz...@gmail.com> wrote: > Hi, > > I have a string "GGGGGGCCCAATCGCAATTCCAATT" > > What I want to do is to count the percentage of each letter in the string, > what string functions can I use to count the number of each letter appearing > in the string? > > For example, the letter "A" appeared 6 times, letter "T" appeared 5 times, > how can I use a string function to get the these number?
The replies you've already received are already helpful ... in addition to them, though, I'd suggest you check out the "Biostrings" package from bioconductor since it looks like you are working with DNA: http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html There are many (many^2) things already implemented in that package that you will likely want to do with genomic sequences, and done so in a memory-and-performance efficient manner. For this particular example: R> library(Biostrings) R> x <- DNAString("GGGGGGCCCAATCGCAATTCCAATT") R> oligonucleotideFrequency(x, 1) A C G T 6 7 7 5 ## And just for fun: R> oligonucleotideFrequency(x, 2) AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT 3 0 0 3 3 3 1 0 0 2 5 0 0 2 0 2 Depending on how much genomic/sequence stuff you are planning to do, it could be worth your while to invest some time looking into various functionality the Biostrings (and IRanges) package(s) provides for you. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.