Hi Paul, I think you're making it far too complicated. With some minor tweaking to your function, I can easily process the entire data frame you originally presented.
nearTerms <- function(rawtext, target, before, after){ Text <- unlist(strsplit(rawtext, " ")) Target <- grep(target, Text) if (length(Target) == 0) {mydf <- ""} else{ Length <- length(Text) Keep <- rep(NA, Length) Lower <- ifelse(Target - before > 0, Target - before, 1) Upper <- ifelse(Target + after < Length, Target + after, Length) for(i in 1:length(Keep)){ for(j in 1:length(Lower)){ Keep[i][i %in% seq(Lower[j], Upper[j])] <- i }} mydf <- paste(Text[!is.na(Keep)], collapse=" ") } names(mydf) <- NULL mydf } testData <- structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ", "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 " ), class = "factor"), encounter_date = structure(c(9L, 10L, 11L, 12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c(" 2009-03-01 ", " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 " ), class = "factor"), raw = c(" ordered kras testing on 10102010 results not yet available if patient has a mutation will start erbitux ", " received kras results on 10202010 test results indicate tumor is wild type ua protein positve erpr positive her2neu positve ", " will conduct kras mutation testing prior to initiation of therapy with erbitux ", " still need to order kras mutation testing ", " ordered kras testing waiting for results ", " kras test results pending note that patient was negative for lynch mutation ", " kras results still pending note that patient was negative for lynch mutation ", " kras mutated will not prescribe erbitux due to mutation ", " kras mutated therefore did not prescribe erbitux ", " kras wild ", " tumor is negative for mutation ", " tumor is wild type patient is eligible to receive eribtux ", " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available ", " kras results are in patient has the mutation ", " ordered kras mutation testing on 02152011 results came back negative several lines of material patient kras mutation test is negative will start erbitux ", " patient is kras negative started erbitux on 03012011 ")), .Names = c("profile_key", "encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame") reduced <- sapply(testData$raw, nearTerms, target = "kras", before = 6, after = 6) testData <- cbind(testData, reduced=reduced) Sarah On Fri, Jun 1, 2012 at 2:51 PM, Paul Miller <pjmiller...@yahoo.com> wrote: > Hello Bert and Sarah, > > Thank you for your replies. Helped me understand how people might perceive my > question and why they might not respond. > > Spent some time learning about R's debugging tools this morning. Began to > realize why my function didn't work. My second argument was the name of a > variable. What I didn't realize is that R would immediately expect this to be > a previously defined object. I had thought that passing the name of the > variable to the body of the function would generate a correct line of code, > and that this was all that was required to get the function to work. > > Below is a function that does work, at least when applied to a single row of > data. I had previously been reading about the Split-Apply-Combine strategy > in a paper about the plyr package. The paper advocates coming up with a > function that works for a subset of one's data and then using plyr to split > up the data and apply the function to each of the subsets. Was under the > impression that this last part would be easy. Seems not to be the case though. > > So on to the next part. > > Thanks again for your feedback. > > Paul > > > #### Test row #### > > testRow <- > structure(list(profile_key = structure(6L, .Label = c("001-001 ", > "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 " > ), class = "factor"), encounter_date = structure(4L, .Label = c(" 2009-03-01 > ", > " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-04-01 ", > " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", > " 2011-04-01 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ", > " 2012-10-05 ", " 2012-10-17 "), class = "factor"), raw = " if patient kras > result is wild type they will start erbitux several lines of material ordered > kras mutation test 11112011 results are still not available "), .Names = > c("profile_key", > "encounter_date", "raw"), row.names = 13L, class = "data.frame") > > testRow > > #### Function for selecting words within specified range of a target term #### > > nearTerms <- function(df, rawtext, target, before, after, reduced){ > Text <- unlist(strsplit(df[,rawtext], " ")) > Target <- grep(target, Text) > > if (length(Target) == 0) {df <- transform(df, outtext = "")} else{ > > Length <- length(Text) > Keep <- rep(NA, Length) > Lower <- ifelse(Target - before > 0, Target - before, 1) > Upper <- ifelse(Target + after < Length, Target + after, Length) > > for(i in 1:length(Keep)){ > for(j in 1:length(Lower)){ > Keep[i][i %in% seq(Lower[j], Upper[j])] <- i > }} > > df <- transform(df, outtext = paste(Text[!is.na(Keep)], collapse=" ")) > > } > > names(df)[names(df) == "outtext"] <- reduced > df <- df > } > > testRow <- nearTerms(df = testRow, rawtext = "raw", target = "kras", before = > 6, after = 6, reduced = "reduced") > testRow -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.