On 07/21/2010 10:02 AM, Davis, Brian wrote: > I have a two part question > > Part 1) I am trying to remove characters in a string based on the > position of a key character in another string. I have a solution that works but it requires a for-loop. A vectorized way of doing this has alluded me.
Hi Brian -- This sounds like processing short reads from DNA sequencing experiments. The Bioconductor project has well-developed tools for doing these types of operations. See the Bioconductor mailing list, the Biostrings, ShortRead, IRanges, ... packages including their vignettes, and perhaps some of the recent course / training material accessible from the web site. http://bioconductor.org/ Also Thomas Girke's group has a straight-forward resource describing use of these tools at http://manuals.bioinformatics.ucr.edu/home/ht-seq If you explore this avenue, then please post messages to the Bioconductor mailing list, where a suitable audience of experienced users will give you prompt advice. Martin > > CleanRead<-function(x,y) { > > if (!is.character(x)) > x <- as.character(x) > if (!is.character(y)) > y <- as.character(y) > > idx<-grep("\\*", x, value=FALSE) > starpos<-gregexpr("\\*", x[idx]) > > ysplit<-strsplit(y[idx], '') > n<-length(idx) > for(i in 1:n) { > ysplit[[i]][starpos[[i]]] = "" > } > > y[idx]<-unlist(lapply(ysplit, paste, sep='', collapse='')) > return(y) > } > > x<-c("AA*.*A,,,", "**a.a*,,,A", "C*c..", "**aA") > y<-c("abcdefghi", "abcdefghij", "abcde", "abcd") > > CleanRead(x,y) > [1] "abdfghi" "cdeghij" "acde" "cd" > > > Is there a better way to do this? > > Part 2) > My next step in the string processing is to take the characters in the output > of CleanRead and subtract 33 from the ascii value of the character to obtain > an integer. Again I have a solution that works, involving splitting the > string into characters then converting them to factors (starting at ascii 34) > and using unclass to get the integer value. (kindof a atoi(x)-33 all in one > step) > > I looked for the C equivalent of atoi, but the only help I could find (R-help > 2003) suggested using as.numeric. However, the help file (and testing) shows > you get 'NA'. > > Am I missing an easier way to do this? > > > > Thanks in advance, > > Brian > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.