On Jul 19, 2010, at 5:31 PM, John1983 wrote:
Hi,
I am a newbie in R and was working on some DNA data represented as
strings
of A,C,T and G (also wild-character like M and X). I use the
Bioconductor
package in R.
Well, I guess it's sort of a "meta" package, but it is really more of
a subculture. It also has its own mailing list.
Currently I need to convert a string of the form "ACCTGMX" to
"1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and
any
other character with a 0. I checked with 'replace' and also with a
function
called 'copySubstitute' found in the Biobase package but this is
only for
files.
The data here is a string ("ACCTGMX" ) and we need to convert it to
yet
another string ("1223400"). Now I use the strsplit function to split
"ACCTGM" into "A" "C" "C" "T" "G" "M" and then use 'which' to assign
the
corresponding numbers.
Is there a faster way to do this or some function I can make use of?
> tst <- rep( "ACCTGMX", 5)
> newtst <- gsub("A", "1", tst)
> newtst <- gsub("C", "2", newtst)
> newtst <- gsub("T", "3", newtst)
> newtst <- gsub("G", "4", newtst)
> newtst <- gsub("[[:alpha:]]", "0", newtst)
> newtst
[1] "1223400" "1223400" "1223400" "1223400" "1223400"
There is also a rollaply function in teh zoo and an strapply function
in the gsubfn package that might be even more powerful, but I am
insufficiently talented to give you a one-liner using them.
Please advise.
Thank you.
--
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.