Since you only have 4 characters, you can can create a table of all the combinations of 4 of them and this will reduce to one byte instead of 4. This is fine if you just want to store them.
> x <- expand.grid(c("A","C","G","T"), + c("A", "C", "G", "T"), + c("A", "C", "G", "T"), + c("A", "C", "G", "T")) > gene.table <- apply(x, 1, paste, collapse='') > # convert the string (right now it is length mod 4. more logic if not > multiple of 4 > gene <- "ACGATACGGCGACCACCGAGATCTACACTCTTCCCC" > # break into 4 character strings > start <- seq(1, by=4, to=nchar(gene)) > strings <- mapply(substr, gene, start, start+3) > # create new compressed string > comp <- as.raw(match(strings, gene.table) - 1) > # convert back > paste(gene.table[as.integer(comp) + 1], collapse='') [1] "ACGATACGGCGACCACCGAGATCTACACTCTTCCCC" > On Wed, Dec 24, 2008 at 10:26 AM, Gundala Viswanath <gunda...@gmail.com> wrote: > Dear all, > > What's the R way to compress the string into smaller 2~3 char/digit length. > In particular I want to compress string of length >=30 characters, > e.g. ACGATACGGCGACCACCGAGATCTACACTCTTCC > > The reason I want to do that is because, there are billions > of such string I want to print out. And I need to save disk space. > > - Gundala Viswanath > Jakarta - Indonesia > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.