Hi Paul,

I think you're making it far too complicated. With some minor tweaking
to your function, I can easily process the entire data frame you
originally presented.


nearTerms <- function(rawtext, target, before, after){
  Text <- unlist(strsplit(rawtext, " "))
  Target <- grep(target, Text)

  if (length(Target) == 0) {mydf <- ""} else{

  Length <- length(Text)
  Keep <- rep(NA, Length)
  Lower <- ifelse(Target - before > 0, Target - before, 1)
  Upper <- ifelse(Target + after < Length, Target + after, Length)

  for(i in 1:length(Keep)){
  for(j in 1:length(Lower)){
     Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
  }}

  mydf <- paste(Text[!is.na(Keep)], collapse=" ")

  }

  names(mydf) <- NULL

  mydf
}



testData <-
structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ",
"001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 "
), class = "factor"), encounter_date = structure(c(9L, 10L, 11L,
12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c("
2009-03-01 ",
" 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ",
" 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ",
" 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 "
), class = "factor"), raw = c(" ordered kras testing on 10102010
results not yet available if patient has a mutation will start erbitux
",
" received kras results on 10202010 test results indicate tumor is
wild type ua protein positve erpr positive her2neu positve ",
" will conduct kras mutation testing prior to initiation of therapy
with erbitux ",
" still need to order kras mutation testing ", " ordered kras testing
waiting for results ",
" kras test results pending note that patient was negative for lynch mutation ",
" kras results still pending note that patient was negative for lynch
mutation ",
" kras mutated will not prescribe erbitux due to mutation ",
" kras mutated therefore did not prescribe erbitux ", " kras wild ",
" tumor is negative for mutation ", " tumor is wild type patient is
eligible to receive eribtux ",
" if patient kras result is wild type they will start erbitux several
lines of material ordered kras mutation test 11112011 results are
still not available ",
" kras results are in patient has the mutation ", " ordered kras
mutation testing on 02152011 results came back negative several lines
of material patient kras mutation test is negative will start erbitux
",
" patient is kras negative started erbitux on 03012011 ")), .Names =
c("profile_key",
"encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame")



reduced <- sapply(testData$raw, nearTerms, target = "kras", before =
6, after = 6)
testData <- cbind(testData, reduced=reduced)

Sarah

On Fri, Jun 1, 2012 at 2:51 PM, Paul Miller <pjmiller...@yahoo.com> wrote:
> Hello Bert and Sarah,
>
> Thank you for your replies. Helped me understand how people might perceive my 
> question and why they might not respond.
>
> Spent some time learning about R's debugging tools this morning. Began to 
> realize why my function didn't work. My second argument was the name of a 
> variable. What I didn't realize is that R would immediately expect this to be 
> a previously defined object. I had thought that passing the name of the 
> variable to the body of the function would generate a correct line of code, 
> and that this was all that was required to get the function to work.
>
> Below is a function that does work, at least when applied to a single row of 
> data. I had previously been reading about the Split-Apply-Combine  strategy 
> in a paper about the plyr package. The paper advocates coming up with a 
> function that works for a subset of one's data and then using plyr to split 
> up the data and apply the function to each of the subsets. Was under the 
> impression that this last part would be easy. Seems not to be the case though.
>
> So on to the next part.
>
> Thanks again for your feedback.
>
> Paul
>
>
> #### Test row ####
>
> testRow <-
> structure(list(profile_key = structure(6L, .Label = c("001-001 ",
> "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 "
> ), class = "factor"), encounter_date = structure(4L, .Label = c(" 2009-03-01 
> ",
> " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-04-01 ",
> " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ",
> " 2011-04-01 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ",
> " 2012-10-05 ", " 2012-10-17 "), class = "factor"), raw = " if patient kras 
> result is wild type they will start erbitux several lines of material ordered 
> kras mutation test 11112011 results are still not available "), .Names = 
> c("profile_key",
> "encounter_date", "raw"), row.names = 13L, class = "data.frame")
>
> testRow
>
> #### Function for selecting words within specified range of a target term ####
>
> nearTerms <- function(df, rawtext, target, before, after, reduced){
>   Text <- unlist(strsplit(df[,rawtext], " "))
>   Target <- grep(target, Text)
>
>   if (length(Target) == 0) {df <- transform(df, outtext = "")} else{
>
>   Length <- length(Text)
>   Keep <- rep(NA, Length)
>   Lower <- ifelse(Target - before > 0, Target - before, 1)
>   Upper <- ifelse(Target + after < Length, Target + after, Length)
>
>   for(i in 1:length(Keep)){
>   for(j in 1:length(Lower)){
>      Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
>   }}
>
>   df <- transform(df, outtext = paste(Text[!is.na(Keep)], collapse=" "))
>
>   }
>
>   names(df)[names(df) == "outtext"] <- reduced
>   df <- df
> }
>
> testRow <- nearTerms(df = testRow, rawtext = "raw", target = "kras", before = 
> 6, after = 6, reduced = "reduced")
> testRow



-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to