Hi Thanks a lot for your insightful answer. I will need to study it in detail, gregexpr and regexpr seems to be quite handy for what I need.
Cheers Petr > -----Original Message----- > From: Ivan Krylov <krylov.r...@gmail.com> > Sent: Tuesday, October 16, 2018 11:08 AM > To: PIKAL Petr <petr.pi...@precheza.cz> > Cc: r-help@r-project.org > Subject: Re: [R] regexp mystery > > On Tue, 16 Oct 2018 08:36:27 +0000 > PIKAL Petr <petr.pi...@precheza.cz> wrote: > > > > dput(x[11]) > > "et odYezko: 3 \fas odYezku: 15 s" > > > gsub("^.*: (\\d+).*$", "\\1", x[11]) > > works for 3 > > This regular expression only matches one space between the colon and the > number, but you have more than one of them before "15". > > > gsub("^.*[^:] (\\d+).*$", "\\1", x[11]) works for 15 > > Match succeeds because a space is not a colon: > > ^.* matches "et odYezko: 3 \fas odYezku: " > [^:] matches space " " > space " " matches another space " " > finally, (\\d+) matches "15" > and .*$ matches " s" > > If you need just the numbers, you might have more success by extracting > matches directly with gregexpr and regmatches: > > ( > function(s) regmatches( > s, > gregexpr("\\d+(\\.\\d+)?", s) > ) > )("et odYezko: 3 \fas odYezku: 15 s") > > [[1]] > [1] "3" "15" > > (I'm creating an anonymous function and evaluating it immediately because I > need to pass the same string to both gregexpr and regmatches.) > > If you need to capture numbers appearing in a specific context, a better > regular > expression suiting your needs might be > > ":\\s*(\\d+(?:\\.\\d+)?)" > > (A colon, followed by optional whitespace, followed by a number to capture, > consisting of decimals followed by optional, non-captured dot followed by > decimals) > > but I couldn't find a way to extract captures from repeated match by using > vanilla R pattern matching (it's either regexec which returns captures for the > first match or gregexpr which returns all matches but without the captures). > If > you can load the stringr package, it's very easy, though: > > str_match_all( > c( > "PYedehYev: 300 s Záva~í: 2.160 kg", > "et odYezko: 3 \fas odYezku: 15 s" > ), > ":\\s*(\\d+(?:\\.\\d+)?)" > ) > [[1]] > [,1] [,2] > [1,] ": 300" "300" > [2,] ": 2.160" "2.160" > > [[2]] > [,1] [,2] > [1,] ": 3" "3" > [2,] ": 15" "15" > > Column 2 of each list item contains the requested captures. > > -- > Best regards, > Ivan Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.