Greg Snow wrote: > Where do you get "should" and "expect" from? All the regular expression > tools that I am familiar with only match non-overlapping patterns unless you > do extra to specify otherwise. One of the standard references for regular > expressions if you really want to understand what is going on is "Mastering > Regular Expressions" by Jeffrey Friedl. You should really read through that > book before passing judgment on the correctness of an implementation. > > If you want the overlaps, you need to come up with a regular expression that > will match without consuming all of the string. Here is one way to do it > with your example: > > > gregexpr("1122(?=1122)", paste(rep("1122", 10), collapse=""), perl=TRUE) > [[1]] > [1] 1 5 9 13 17 21 25 29 33 > attr(,"match.length") > [1] 4 4 4 4 4 4 4 4 4 > >
another option would be to move the anchor backwards after each match, but i'm not sure if the problem really needs it and if it could be done from within r. greg (and another person who answered this post earlier): while your frustration is understandable, i think reid (and possibly other users as well) would benefit from a brief explanation instead of your emotional reactions. you ought to be more patient and less arrogant with newbies who will often think there is a bug in r when there isn't. reid: when matching is performed, there is a pointer moved through the string. in global matching, after a match is found the pointer is just behind the matched substring, and further matching proceeds from there. for example example, suppose you match "aaa" (the string) with "aa" (the pattern) globally. after the first successful match, the position pointer is *behind the second a* in the string, and no further match can be found from there.in this context, 'global' does not mean that all possible matches are found, rather that matching is performed iteratively. the above is probably a solution to your problem, though the matches have length 4, not 8. in perl, you could manually move back the anchor after each match, e.g.: $string = "1122" x 10; $n = length($string)/2; @matches = (); $string =~ /11221122(??{push @matches [$-[0], $&]; pos($s) -= $n})/g; now @matches has 9 elements, each a ref to an array with the starting position and the content (of length 8) of the respective match: @matches = ([0, "11221122"], [4, "11221122"], ...) not sure if you can do this within r. not sure if you'll ever need it. for more complex cases when you need overlapping matches and you need their content, greg's solution might not do, but in general that's the solution. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.