Thank you very much for your help.

I'm using the second suggestion in my program and it works very well.

Jonas


-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
Sent: Thu 11/10/2011 5:58 AM
To: Richter-Dumke, Jonas
Cc: r-help@r-project.org
Subject: Re: [R] match first consecutive list of capitalized words in string
 
On Tue, Nov 8, 2011 at 7:48 AM, Richter-Dumke, Jonas
<rich...@demogr.mpg.de> wrote:
> Dear R-Helpers,
>
> this is my first post ever to a mailing list, so please feel free to point 
> out any missunderstandings on my side regarding the conventions of this 
> mailing list.
>
> My problem:
>
> Assuming the following character vector is given:
>
> names <- c("filia Maria", "vidua Joh Dirck Kleve (oo 02.02.1732)", "Bernardus 
> Engelb Franciscus Linde j.u.Doktor referendarius sereniss Judex et gograven 
> Rheinensis")
>
> Is there a regular expression matching the first consecutive list of 
> capitalized words in a single characterstring ("Maria", "Joh Dirck Kleve", 
> "Bernardus Engelb Franciscus Linde")?
> This expression would very reliably seperate the person names from the 
> additional information in my historic church register transcription.
>

Try this. It matches a word boundary followed by zero or more of the
parenthesized expression.  That expression is an upper case letter
followed by zero or more lower case letters followed by one or more
spaces.  Finally we match the last word which consists of an upper
case letter followed by zero or more lower case letters and a word
boundary.  Note that it assumes R 2.14.0 or later:

> re <- "\\b([[:upper:]][[:lower:]]* +)*[[:upper:]][[:lower:]]*\\b"
> regmatches(names, regexpr(re, names))
[1] "Maria"                             "Joh Dirck Kleve"
[3] "Bernardus Engelb Franciscus Linde"

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


----------
This mail has been sent through the MPI for Demographic ...{{dropped:10}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to