On 07/21/2010 10:02 AM, Davis, Brian wrote:
> I have a two part question
> 
> Part 1) I am trying to remove characters in a string based on the
> position of
a key character in another string. I have a solution that works but it
requires a for-loop. A vectorized way of doing this has alluded me.

Hi Brian --

This sounds like processing short reads from DNA sequencing experiments.
The Bioconductor project has well-developed tools for doing these types
of operations. See the Bioconductor mailing list, the Biostrings,
ShortRead, IRanges, ... packages including  their vignettes, and perhaps
some of the recent course / training material accessible from the web site.

  http://bioconductor.org/

Also Thomas Girke's group has a straight-forward resource describing use
of these tools at

  http://manuals.bioinformatics.ucr.edu/home/ht-seq

If you explore this avenue, then please post messages to the
Bioconductor mailing list, where a suitable audience of experienced
users will give you prompt advice.

Martin

> 
> CleanRead<-function(x,y) {
> 
>   if (!is.character(x)) 
>     x <- as.character(x)
>   if (!is.character(y)) 
>     y <- as.character(y)
> 
>   idx<-grep("\\*", x, value=FALSE)
>   starpos<-gregexpr("\\*", x[idx])
>   
>   ysplit<-strsplit(y[idx], '')
>   n<-length(idx)
>   for(i in 1:n) {
>     ysplit[[i]][starpos[[i]]] = ""
>   }
> 
>   y[idx]<-unlist(lapply(ysplit, paste, sep='', collapse=''))
>   return(y)
> }
> 
> x<-c("AA*.*A,,,", "**a.a*,,,A", "C*c..", "**aA") 
> y<-c("abcdefghi", "abcdefghij", "abcde", "abcd")
> 
> CleanRead(x,y)
> [1] "abdfghi" "cdeghij" "acde"    "cd"
> 
> 
> Is there a better way to do this?
> 
> Part 2) 
> My next step in the string processing is to take the characters in the output 
> of CleanRead and subtract 33 from the ascii value of the character to obtain 
> an integer. Again I have a solution that works, involving splitting the 
> string into characters then converting them to factors (starting at ascii 34) 
> and using unclass to get the integer value. (kindof a atoi(x)-33 all in one 
> step)
> 
> I looked for the C equivalent of atoi, but the only help I could find (R-help 
> 2003) suggested using as.numeric.  However, the help file (and testing) shows 
> you get 'NA'.   
> 
> Am I missing an easier way to do this?
> 
> 
> 
> Thanks in advance,
> 
> Brian
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to