Ah, that's interesting - thanks Bill. That's certainly on the right track for me (Titus, you too ?) especially if the subpattern argument accepted a vector of multiple group indices.
As you say, this is straightforward in C. I'd be happy to (try to) make a patch for the R sources if there was some consensus on the best way to implement it, ie. as a new R function or by extending existing function(s). Michael On 29 September 2010 01:46, William Dunlap wrote: > > S+ has a subpattern=number argument to regexpr and > related functions. It means that the text matched > by the subpattern'th parenthesized expression in the > pattern will be considered the matched text. E.g., > to find runs of b's that come immediately after a's: > > > gregexpr("a+(b+)", "abcdaabbc", subpattern=1) > [[1]]: > [1] 2 7 > attr(, "match.length"): > [1] 1 2 > > or to find bc's that come after 2 or more ab's > > gregexpr("(ab){2,}bc", "abbcabababbcabcababbc", subpattern=1) > > regexpr() and strsplit() have this argument in S+ 8.1 but > gregexpr() is not yet in a released version of S+. > > subpattern=0, the default, means to use the entire > pattern. regexpr allows subpattern=-1, which means > to return a list with one element for each subpattern. > I don't know if the extra complexity is worth it. > (gregexpr does not allow subpattern=-1.) > > The usual C regexec() returns this information. > Perhaps it would be handy to have it in R. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.