Where do you get "should" and "expect" from?  All the regular expression tools 
that I am familiar with only match non-overlapping patterns unless you do extra 
to specify otherwise.  One of the standard references for regular expressions 
if you really want to understand what is going on is "Mastering Regular 
Expressions" by Jeffrey Friedl.  You should really read through that book 
before passing judgment on the correctness of an implementation.

If you want the overlaps, you need to come up with a regular expression that 
will match without consuming all of the string.  Here is one way to do it with 
your example:

 > gregexpr("1122(?=1122)", paste(rep("1122", 10), collapse=""), perl=TRUE)
[[1]]
[1]  1  5  9 13 17 21 25 29 33
attr(,"match.length")
[1] 4 4 4 4 4 4 4 4 4



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
> project.org] On Behalf Of rthom...@aecom.yu.edu
> Sent: Friday, December 12, 2008 10:05 AM
> To: r-de...@stat.math.ethz.ch
> Cc: r-b...@r-project.org
> Subject: [Rd] gregexpr - match overlap mishandled (PR#13391)
>
> Full_Name: Reid Thompson
> Version: 2.8.0 RC (2008-10-12 r46696)
> OS: darwin9.5.0
> Submission from: (NULL) (129.98.107.177)
>
>
> the gregexpr() function does NOT return a complete list of global
> matches as it
> should.  this occurs when a pattern matches two overlapping portions of
> a
> string, only the first match is returned.
>
> the following function call demonstrates this error (although this is
> not how I
> initially discovered the problem):
> gregexpr("11221122", paste(rep("1122", 10), collapse=""))
>
> instead of returning 9 matches as one would expect, only 5 matches are
> returned
> . . .
>
> [[1]]
> [1]  1  9 17 25 33
> attr(,"match.length")
> [1] 8 8 8 8 8
>
> you will note, essentially, that the entire first match is then
> excluded from
> subsequent matching
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to