Re: [R] [Rd] gregexpr - match overlap mishandled (PR#13391)

Wacek Kusnierczyk Sun, 14 Dec 2008 04:41:17 -0800

Greg Snow wrote:
> Controlling the pointer is going to be very different from perl since the R 
> functions are vectorized rather than focusing on a single string.
>
> Here is one approach that will give all the matches and lengths (for the 
> original problem at least):
>
>   
>> mystr <- paste(rep("1122", 10), collapse="")
>> n <- nchar(mystr)
>>
>> mystr2 <- substr(rep(mystr,n), 1:n, n)
>>
>> tmp <- regexpr("^11221122", mystr2)
>> (tmp + 1:n - 1)[tmp>0]
>>     
> [1]  1  5  9 13 17 21 25 29 33
>   
>> attr(tmp,"match.length")[tmp>0]
>>     
> [1] 8 8 8 8 8 8 8 8 8
>
>


while not exactly what i meant, this is an implementation of one of the
approaches mentioned below, ith care taken not to report duplicate matches:

>> sequentially perform single matches on successive substrings of the
>> input string (which can give you the same match more than once,
>> though).  

one issue with your solution is that it allocates n substrings at the
same time, which requires O(n^2) space (with n the length of the
original string), but it may be faster than a for loop matching one
substring at a time.

vQ

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Rd] gregexpr - match overlap mishandled (PR#13391)

Reply via email to