Wacek already mentioned that; however, its still arguably more complex to specify delimiters than to specify content. Aside from having to specify perl = TRUE and ungreedy matching the content-based regexp is entirely straight forward but for lookbehind (including \K) one has the added complexity of distinguishing between matching and returned values.
On Tue, Jun 9, 2009 at 12:36 PM, Greg Snow<greg.s...@imail.org> wrote: > You can sometimes fake variable width look behinds with Perl regexs using > '\K': > >> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1. 11.', perl=TRUE) > [[1]] > [1] 5 13 > attr(,"match.length") > [1] 1 1 > > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Wacek Kusnierczyk >> Sent: Tuesday, June 09, 2009 1:05 AM >> To: Gabor Grothendieck >> Cc: r-help@r-project.org; Mark Heckmann >> Subject: Re: [R] using regular expressions to retrieve a digit-digit- >> dot structure from a string >> >> Gabor Grothendieck wrote: >> > On Mon, Jun 8, 2009 at 7:18 PM, Wacek >> > Kusnierczyk<waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: >> > >> >> Gabor Grothendieck wrote: >> >> >> >>> Try this. See ?regex for more. >> >>> >> >>> >> >>> >> >>>> x <- 'This happened in the 21. century." (the dot behind 21 is' >> >>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE) >> >>>> >> >>>> >> >>> [1] 24 >> >>> attr(,"match.length") >> >>> [1] 1 >> >>> >> >>> >> >> yes, but >> >> >> >> gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >> >> # 2 5 9 >> >> >> > >> > Yes, it should be: >> > >> > >> >> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRUE) >> >> >> > [[1]] >> > [1] 5 9 >> > attr(,"match.length") >> > [1] 1 1 >> > >> > which displays the position of every dot that is preceded >> > immediately by a digit. Or just replace gregexpr with regexpr >> > if its intended that it match only one. >> > >> >> i guess what was needed was something like >> >> gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >> # 5 >> >> which won't work, however, because pcre does not support variable-width >> lookbehinds. >> >> > >> >> which, i guess, is not what you want. if what you want is to match >> all >> >> and only dots that follow at least one digit preceded by a word >> >> boundary, then the following should do, as far as i can see: >> >> >> >> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1.', perl=TRUE) >> >> # 5 >> >> >> >> vQ >> >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.