Replace ".*" (any number of any character) with
"[^]]*" (any number of anycharacter except "]")
or use the perl-specific non-greedy operator, ?, which
means to match the shortest amount of text matching
the pattern instead of the longest.
gsub("\\[.*?\\]", "", text, perl=TRUE)
Dealing with nested bracket pairs is much more difficult.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
________________________________
From: Mark Kimpel [mailto:[email protected]]
Sent: Thursday, August 20, 2009 9:28 AM
To: William Dunlap; [email protected]
Subject: Re: [R] help with regular expressions in R
Well, I guess I'm not quite there yet. What I gave earlier was a
simplified example, and did not accurately reflect the complexity of the
task.
This is my real world example. As you can see, what I need to do
is delete an arbitrary number of characters, including brackets and
parens enclosing them, multiple times within the same string. Help?
myCharVec <- "medicare [link 220.30.05] ssa (1-800-772-1213).
2008 [link 145.30.05] amounts (2d) gross income (magi) here. (2e)"
myCharVec
myCharVec <- gsub('\\[.*\\]', '', myCharVec)
myCharVec
myCharVec <- gsub('\\(.*\\)', '', myCharVec)
myCharVec
#what I want
# "medicare ssa . 2008 amounts gross income here."
myCharVec <- "medicare [link 220.30.05] ssa (1-800-772-1213).
2008 [link 145.30.05] amounts (2d) gross income (magi) here. (2e)"
> myCharVec
[1] "medicare [link 220.30.05] ssa (1-800-772-1213). 2008
[link 145.30.05] amounts (2d) gross income (magi) here. (2e)"
> myCharVec <- gsub('\\[.*\\]', '', myCharVec)
> myCharVec
[1] "medicare amounts (2d) gross income (magi) here. (2e)"
> myCharVec <- gsub('\\(.*\\)', '', myCharVec)
> myCharVec
[1] "medicare amounts "
>
> #what I want
> # "medicare ssa . 2008 amounts gross income here."
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
"The real problem is not whether machines think but whether men
do." -- B. F. Skinner
******************************************************************
On Thu, Aug 20, 2009 at 11:39 AM, William Dunlap
<[email protected]> wrote:
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
Mark Kimpel
> Sent: Thursday, August 20, 2009 8:31 AM
> To: [email protected]
> Subject: [R] help with regular expressions in R
> ...
> myCharVec <- c("[the rain in spain]", "(the rain in
spain)")
> gsub('\\[*.\\]', '', myCharVec)
Change the '*.' to '.*'.
Your expression matches 0 or more left square brackets,
followed by 1 character, followed by a right squared
bracket.
"\\[.*\]]" matches a left square bracket, followed by 0
or more
characters, followed by a right square bracket.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
>
> #what I get
> # [1] "[the rain in spai" "(the rain in spain)"
>
> #what I want
> [1] "" "(the rain in spain)"
>
> > sessionInfo()
> R version 2.10.0 Under development (unstable)
(2009-08-12 r49193)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C
LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices datasets utils
methods base
>
> other attached packages:
> [1] RWeka_0.3-20 tm_0.4
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.0 rJava_0.6-3 slam_0.1-3
>
>
>
------------------------------------------------------------
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of
Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
>
> "The real problem is not whether machines think but
whether
> men do." -- B.
> F. Skinner
>
******************************************************************
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.