Re: [R] help with regular expressions in R

Mark Kimpel Thu, 20 Aug 2009 10:12:54 -0700

Thanks guys. I've pulled my O'Reilly book and will begin reviewing it.
------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine


15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail

"The real problem is not whether machines think but whether men do." -- B.
F. Skinner
******************************************************************


On Thu, Aug 20, 2009 at 12:37 PM, Phil Spector <spec...@stat.berkeley.edu>wrote:

> Mark -
>   It looks like you're running into the greediness of regular expressions.
> When R sees ".*" it tries to find the longest match,  which also grabs
> some of the stuff you want.  You can either replace .* with something
> like [^\\])]* (i.e. one or more of any character *except* "]" or ")" ),
> or use perl=TRUE, which allows the question mark ("?") to mean the shortest
> match instead of the longest.  Here's what I'd use:
>
>  gsub('[\\[(].*?[\\])]','',myCharVec,perl=TRUE)
>
> In English:  substitute the shortest string starting with "[" or "(" and
> ending with "]" or ")" with nothing.
>
>   Hope this helps.
>                                                     - Phil
>
>
>
>
> On Thu, 20 Aug 2009, Mark Kimpel wrote:
>
>  Well, I guess I'm not quite there yet. What I gave earlier was a
>> simplified
>> example, and did not accurately reflect the complexity of the task.
>>
>> This is my real world example. As you can see, what I need to do is delete
>> an arbitrary number of characters, including brackets and parens enclosing
>> them, multiple times within the same string. Help?
>>
>> myCharVec <-  "medicare [link  220.30.05]  ssa (1-800-772-1213). 2008
>> [link
>> 145.30.05] amounts  (2d) gross income (magi) here. (2e)"
>> myCharVec
>> myCharVec <- gsub('\\[.*\\]', '', myCharVec)
>> myCharVec
>> myCharVec <- gsub('\\(.*\\)', '', myCharVec)
>> myCharVec
>>
>> #what I want
>> # "medicare  ssa . 2008  amounts gross income here."
>>
>> myCharVec <-  "medicare [link  220.30.05]  ssa (1-800-772-1213). 2008
>> [link
>> 145.30.05] amounts  (2d) gross income (magi) here. (2e)"
>>
>>> myCharVec
>>>
>> [1] "medicare [link  220.30.05]  ssa (1-800-772-1213). 2008 [link
>> 145.30.05] amounts  (2d) gross income (magi) here. (2e)"
>>
>>> myCharVec <- gsub('\\[.*\\]', '', myCharVec)
>>> myCharVec
>>>
>> [1] "medicare  amounts  (2d) gross income (magi) here. (2e)"
>>
>>> myCharVec <- gsub('\\(.*\\)', '', myCharVec)
>>> myCharVec
>>>
>> [1] "medicare  amounts  "
>>
>>>
>>> #what I want
>>> # "medicare  ssa . 2008  amounts gross income here."
>>>
>> ------------------------------------------------------------
>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN  46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>>
>> "The real problem is not whether machines think but whether men do." -- B.
>> F. Skinner
>> ******************************************************************
>>
>>
>> On Thu, Aug 20, 2009 at 11:39 AM, William Dunlap <wdun...@tibco.com>
>> wrote:
>>
>>
>>>  -----Original Message-----
>>>> From: r-help-boun...@r-project.org
>>>> [mailto:r-help-boun...@r-project.org] On Behalf Of Mark Kimpel
>>>> Sent: Thursday, August 20, 2009 8:31 AM
>>>> To: r-help@r-project.org
>>>> Subject: [R] help with regular expressions in R
>>>> ...
>>>> myCharVec <- c("[the rain in spain]", "(the rain in spain)")
>>>> gsub('\\[*.\\]', '', myCharVec)
>>>>
>>>
>>> Change the '*.' to '.*'.
>>>
>>> Your expression matches 0 or more left square brackets,
>>> followed by 1 character, followed by a right squared bracket.
>>>
>>> "\\[.*\]]" matches a left square bracket, followed by 0 or more
>>> characters, followed by a right square bracket.
>>>
>>> Bill Dunlap
>>> TIBCO Software Inc - Spotfire Division
>>> wdunlap tibco.com
>>>
>>>
>>>> #what I get
>>>> # [1] "[the rain in spai"   "(the rain in spain)"
>>>>
>>>> #what I want
>>>> [1] ""   "(the rain in spain)"
>>>>
>>>>  sessionInfo()
>>>>>
>>>> R version 2.10.0 Under development (unstable) (2009-08-12 r49193)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>>>
>>>> other attached packages:
>>>> [1] RWeka_0.3-20 tm_0.4
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] grid_2.10.0 rJava_0.6-3 slam_0.1-3
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>>>> Indiana University School of Medicine
>>>>
>>>> 15032 Hunter Court, Westfield, IN  46074
>>>>
>>>> (317) 490-5129 Work, & Mobile & VoiceMail
>>>>
>>>> "The real problem is not whether machines think but whether
>>>> men do." -- B.
>>>> F. Skinner
>>>> ******************************************************************
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with regular expressions in R

Reply via email to