Perhaps you could clarify what the general rule is but assuming that what you want is any word after a colon it can be done with strapply in the gsubfn package like this:
Lines <- c("Year Built: 1873 Gross Building Area: 578 sq ft", "Total Rooms: 6 Living Area: 578 sq ft") library(gsubfn) strapply(Lines, ": *(\\w+)", backref = -1) # or if each line has same number of returned words strapply(Lines, ": *(\\w+)", backref = -1, simplify = rbind) This matches a colon (:) followed by zero or more spaces ( *) followed by a word ((\\w+)) and backref= - 1 causes it to return only the first backreference (i..e. the portion within parentheses) but not the match itself. On 9/25/07, lucy b <[EMAIL PROTECTED]> wrote: > Dear List, > > I have an ascii text file with data I'd like to extract. Example: > > Year Built: 1873 Gross Building Area: 578 sq ft > Total Rooms: 6 Living Area: 578 sq ft > > There is a lot of data I'd like to ignore in each record, so I'm > hoping there is a way to use strings as delimiters to get the data I > want (e.g. tell R to take data between "Built:" and "Gross" - > incidentally, not always numeric). I think an ugly way would be to > start at the end of each record and use a substitution expression to > chip away at it, but I'm afraid it will take forever to run. Is there > a way to use strings as delimiters in an expression? > > Thanks in advance for ideas. > > LB > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.