On Wed, Jul 21, 2010 at 1:02 PM, Davis, Brian <brian.da...@uth.tmc.edu> wrote:
> I have a two part question
>
> Part 1)
> I am trying to remove characters in a string based on the position of a key 
> character in another string.  I have a solution that works but it requires a 
> for-loop.  A vectorized way of doing this has alluded me.
>
> CleanRead<-function(x,y) {
>
>   if (!is.character(x))
>     x <- as.character(x)
>   if (!is.character(y))
>     y <- as.character(y)
>
>   idx<-grep("\\*", x, value=FALSE)
>   starpos<-gregexpr("\\*", x[idx])
>
>   ysplit<-strsplit(y[idx], '')
>   n<-length(idx)
>   for(i in 1:n) {
>     ysplit[[i]][starpos[[i]]] = ""
>   }
>
>   y[idx]<-unlist(lapply(ysplit, paste, sep='', collapse=''))
>   return(y)
> }
>
> x<-c("AA*.*A,,,", "**a.a*,,,A", "C*c..", "**aA")
> y<-c("abcdefghi", "abcdefghij", "abcde", "abcd")
>
> CleanRead(x,y)
> [1] "abdfghi" "cdeghij" "acde"    "cd"
>
>
> Is there a better way to do this?
>
> Part 2)
> My next step in the string processing is to take the characters in the output 
> of CleanRead and subtract 33 from the ascii value of the character to obtain 
> an integer. Again I have a solution that works, involving splitting the 
> string into characters then converting them to factors (starting at ascii 34) 
> and using unclass to get the integer value. (kindof a atoi(x)-33 all in one 
> step)
>
> I looked for the C equivalent of atoi, but the only help I could find (R-help 
> 2003) suggested using as.numeric.  However, the help file (and testing) shows 
> you get 'NA'.
>

This splits x and y into vectors of single characters, extracts those
from y for which x is not * and then matches the result to letters to
return a number.

f <- function(x, y) match(y[x != "*"], letters)
mapply(f, strsplit(x, ""), strsplit(y, ""))

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to