I'd definitely be a customer for it Titus. And it does seem like an obvious hole in regex processing in R that cries out to be filled.
Um, ggregexpr isn't the sexiest of function names :) Perhaps we can think of something a little easier ? How is your C coding ? Bill ? Anyone else ? I could have a got at writing some prototype code to test in the next few days, though if someone else with decent C skills is itching to do it please speak up. Michael On 29 September 2010 20:08, Titus von der Malsburg <malsb...@gmail.com> wrote: > Bill, Michael, > > good to see I'm not the only one who sees potential for improvements > in the regexpr domain. Adding a subpattern argument is certainly a > step in the right direction and would make my life much easier. > However, in my application I need to know not only the position of one > group but also the position of the overall match in the original > string. The ideal solution would provide positions and match lengths > for the whole pattern and for all groups if desired. Only this would > solve all related issues. One possibility is to have a subpattern > argument that accepts a vector of numbers (0 refers to the whole > pattern): > > > gregexpr("a+(b+)", "abcdaabbc", subpattern=c(0,1)) > [[1]]: > [[1]][[1]]: > [1] 1 5 > attr(, "match.length"): > [1] 2 4 > [[1]][[2]]: > [1] 2 7 > attr(, "match.length"): > [1] 1 2 > > A weakness of this solution is that the structure of the return values > changes if length(subpattern)>1. An alternative is to have a separate > function, say ggregepxr for group gregexpr, that returns a list of > lists as in the above example. This function would always return > positions and match lengths of the whole pattern (group 0) and all > groups. The original gregexpr could still have the subpattern > argument but it would only accept single numbers. This way the return > format of gregexpr remains the same. > > Best, > > Titus > > > On Wed, Sep 29, 2010 at 2:42 AM, Michael Bedward > <michael.bedw...@gmail.com> wrote: >> Ah, that's interesting - thanks Bill. That's certainly on the right >> track for me (Titus, you too ?) especially if the subpattern argument >> accepted a vector of multiple group indices. >> >> As you say, this is straightforward in C. I'd be happy to (try to) >> make a patch for the R sources if there was some consensus on the best >> way to implement it, ie. as a new R function or by extending existing >> function(s). > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.