Hi

Thanks a lot for your insightful answer. I will need to study it in detail, 
gregexpr and regexpr seems to be quite handy for what I need.

Cheers
Petr

> -----Original Message-----
> From: Ivan Krylov <krylov.r...@gmail.com>
> Sent: Tuesday, October 16, 2018 11:08 AM
> To: PIKAL Petr <petr.pi...@precheza.cz>
> Cc: r-help@r-project.org
> Subject: Re: [R] regexp mystery
>
> On Tue, 16 Oct 2018 08:36:27 +0000
> PIKAL Petr <petr.pi...@precheza.cz> wrote:
>
> > > dput(x[11])
> > "et odYezko: 3                     \fas odYezku:   15 s"
>
> > gsub("^.*: (\\d+).*$", "\\1", x[11])
> > works for 3
>
> This regular expression only matches one space between the colon and the
> number, but you have more than one of them before "15".
>
> > gsub("^.*[^:] (\\d+).*$", "\\1", x[11]) works for 15
>
> Match succeeds because a space is not a colon:
>
>  ^.* matches "et odYezko: 3                     \fas odYezku:  "
>  [^:] matches space " "
>  space " " matches another space " "
>  finally, (\\d+) matches "15"
>  and .*$ matches " s"
>
> If you need just the numbers, you might have more success by extracting
> matches directly with gregexpr and regmatches:
>
> (
> function(s) regmatches(
> s,
> gregexpr("\\d+(\\.\\d+)?", s)
> )
> )("et odYezko: 3                     \fas odYezku:   15 s")
>
> [[1]]
> [1] "3"  "15"
>
> (I'm creating an anonymous function and evaluating it immediately because I
> need to pass the same string to both gregexpr and regmatches.)
>
> If you need to capture numbers appearing in a specific context, a better 
> regular
> expression suiting your needs might be
>
> ":\\s*(\\d+(?:\\.\\d+)?)"
>
> (A colon, followed by optional whitespace, followed by a number to capture,
> consisting of decimals followed by optional, non-captured dot followed by
> decimals)
>
> but I couldn't find a way to extract captures from repeated match by using
> vanilla R pattern matching (it's either regexec which returns captures for the
> first match or gregexpr which returns all matches but without the captures). 
> If
> you can load the stringr package, it's very easy, though:
>
> str_match_all(
> c(
> "PYedehYev:  300 s              Záva~í: 2.160 kg",
> "et odYezko: 3               \fas odYezku:   15 s"
> ),
> ":\\s*(\\d+(?:\\.\\d+)?)"
> )
> [[1]]
>      [,1]      [,2]
> [1,] ":  300"  "300"
> [2,] ": 2.160" "2.160"
>
> [[2]]
>      [,1]     [,2]
> [1,] ": 3"    "3"
> [2,] ":   15" "15"
>
> Column 2 of each list item contains the requested captures.
>
> --
> Best regards,
> Ivan
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to