Replace ".*" (any number of any character) with
"[^]]*" (any number of anycharacter except "]")
or use the perl-specific non-greedy operator, ?, which
means to match the shortest amount of text matching
the pattern instead of the longest.
   gsub("\\[.*?\\]", "", text, perl=TRUE)
 
Dealing with nested bracket pairs is much more difficult.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

 


________________________________

        From: Mark Kimpel [mailto:mwkim...@gmail.com] 
        Sent: Thursday, August 20, 2009 9:28 AM
        To: William Dunlap; r-help@r-project.org
        Subject: Re: [R] help with regular expressions in R
        
        
        Well, I guess I'm not quite there yet. What I gave earlier was a
simplified example, and did not accurately reflect the complexity of the
task.
        
        This is my real world example. As you can see, what I need to do
is delete an arbitrary number of characters, including brackets and
parens enclosing them, multiple times within the same string. Help?
        
        myCharVec <-  "medicare [link  220.30.05]  ssa (1-800-772-1213).
2008 [link  145.30.05] amounts  (2d) gross income (magi) here. (2e)"
        myCharVec
        myCharVec <- gsub('\\[.*\\]', '', myCharVec)
        myCharVec
        myCharVec <- gsub('\\(.*\\)', '', myCharVec)
        myCharVec
        
        #what I want
        # "medicare  ssa . 2008  amounts gross income here."
        
        myCharVec <-  "medicare [link  220.30.05]  ssa (1-800-772-1213).
2008 [link  145.30.05] amounts  (2d) gross income (magi) here. (2e)"
        > myCharVec
        [1] "medicare [link  220.30.05]  ssa (1-800-772-1213). 2008
[link  145.30.05] amounts  (2d) gross income (magi) here. (2e)"
        > myCharVec <- gsub('\\[.*\\]', '', myCharVec)
        > myCharVec
        [1] "medicare  amounts  (2d) gross income (magi) here. (2e)"
        > myCharVec <- gsub('\\(.*\\)', '', myCharVec)
        > myCharVec
        [1] "medicare  amounts  "
        > 
        > #what I want
        > # "medicare  ssa . 2008  amounts gross income here."
        ------------------------------------------------------------
        Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
        Indiana University School of Medicine
        
        15032 Hunter Court, Westfield, IN  46074
        
        (317) 490-5129 Work, & Mobile & VoiceMail
        
        "The real problem is not whether machines think but whether men
do." -- B. F. Skinner
        
******************************************************************
        
        
        
        On Thu, Aug 20, 2009 at 11:39 AM, William Dunlap
<wdun...@tibco.com> wrote:
        


                > -----Original Message-----
                > From: r-help-boun...@r-project.org
                > [mailto:r-help-boun...@r-project.org] On Behalf Of
Mark Kimpel
                > Sent: Thursday, August 20, 2009 8:31 AM
                > To: r-help@r-project.org
                > Subject: [R] help with regular expressions in R
                > ...
                > myCharVec <- c("[the rain in spain]", "(the rain in
spain)")
                > gsub('\\[*.\\]', '', myCharVec)
                
                Change the '*.' to '.*'.
                
                Your expression matches 0 or more left square brackets,
                followed by 1 character, followed by a right squared
bracket.
                
                "\\[.*\]]" matches a left square bracket, followed by 0
or more
                characters, followed by a right square bracket.
                
                Bill Dunlap
                TIBCO Software Inc - Spotfire Division
                wdunlap tibco.com
                
                >
                > #what I get
                > # [1] "[the rain in spai"   "(the rain in spain)"
                >
                > #what I want
                > [1] ""   "(the rain in spain)"
                >
                > > sessionInfo()
                > R version 2.10.0 Under development (unstable)
(2009-08-12 r49193)
                > x86_64-unknown-linux-gnu
                >
                > locale:
                >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
                >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
                >  [5] LC_MONETARY=C
LC_MESSAGES=en_US.UTF-8
                >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
                >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
                > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
                >
                > attached base packages:
                > [1] stats     graphics  grDevices datasets  utils
methods   base
                >
                > other attached packages:
                > [1] RWeka_0.3-20 tm_0.4
                >
                > loaded via a namespace (and not attached):
                > [1] grid_2.10.0 rJava_0.6-3 slam_0.1-3
                >
                >
                >
------------------------------------------------------------
                > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of
Psychiatry
                > Indiana University School of Medicine
                >
                > 15032 Hunter Court, Westfield, IN  46074
                >
                > (317) 490-5129 Work, & Mobile & VoiceMail
                >
                > "The real problem is not whether machines think but
whether
                > men do." -- B.
                > F. Skinner
                >
******************************************************************
                >
                >       [[alternative HTML version deleted]]
                >
                > ______________________________________________
                > R-help@r-project.org mailing list
                > https://stat.ethz.ch/mailman/listinfo/r-help
                > PLEASE do read the posting guide
                > http://www.R-project.org/posting-guide.html
                > and provide commented, minimal, self-contained,
reproducible code.
                >
                



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to