Gabor Grothendieck wrote: > On Tue, Jun 9, 2009 at 3:04 AM, Wacek > Kusnierczyk<waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: > >> Gabor Grothendieck wrote: >> >>> On Mon, Jun 8, 2009 at 7:18 PM, Wacek >>> Kusnierczyk<waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: >>> >>> >>>> Gabor Grothendieck wrote: >>>> >>>> >>>>> Try this. See ?regex for more. >>>>> >>>>> >>>>> >>>>> >>>>>> x <- 'This happened in the 21. century." (the dot behind 21 is' >>>>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE) >>>>>> >>>>>> >>>>>> >>>>> [1] 24 >>>>> attr(,"match.length") >>>>> [1] 1 >>>>> >>>>> >>>>> >>>> yes, but >>>> >>>> gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >>>> # 2 5 9 >>>> >>>> >>> Yes, it should be: >>> >>> >>> >>>> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRU >>>> > E) > >>> [[1]] >>> [1] 5 9 >>> attr(,"match.length") >>> [1] 1 1 >>> >>> which displays the position of every dot that is preceded >>> immediately by a digit. Or just replace gregexpr with regexpr >>> if its intended that it match only one. >>> >>> >> i guess what was needed was something like >> >> gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >> # 5 >> >> which won't work, however, because pcre does not support variable-width >> lookbehinds. >> > > No, what I wrote was what I intended. I don't think we are > discussing the answer > at this point but just the interpretation of what was intended.
which amounts to discussing whether the answer is appropriate ;) > You > are including > the word boundary in the question and I am not. indeed, and i think this was essential. but irrespectively of whether it really was or not, this sort of problem shows the insufficiency of a lookbehind, and illustrates the use of the \K operator, so it will hopefully be easier for the op and others to design the right pattern in similar future cases. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.