On Dec 11, 2012, at 11:14 AM, Steven Ranney wrote: > David and Jim - > > Thanks for your help. Your suggestions worked just fine. Now my task > is to learn why the random-looking string of characters in the first > part of Jim's sub() statement aren't really so random. >
Jim's solution can be read as: Pattern matching phase: continue along all the characters, ".*?" from the beginning "^" until you encounter any characters in the range "0" to "9" that are all together just before the end ("$"). Label or store those in-range characters as matched group numbered "\\1". The entire pattern will match the whole string. Substitution phase: Replace what is matched (the whole string in this case) with just the first numbered matched group, "\\1". Notice that this could be thought of as a "positive replacement" in contrast to my solution and Gabor Grothendieck's later and slightly more compact version which could be called "negative replacements". -- David > Thanks again - > > SR > Steven H. Ranney > > > On Tue, Dec 11, 2012 at 11:37 AM, David Winsemius > <dwinsem...@comcast.net> wrote: >> >> On Dec 11, 2012, at 10:10 AM, jim holtman wrote: >> >>> try this: >>> >>>> x >>> >>> [1] "OYS-PIA2-FL-1" "OYS-PIA2-LA-1" "OYS-PI-LA-BB-1" "OYS-PIA2-LA-10" >>>> >>>> sub("^.*?([0-9]+)$", "\\1", x) >>> >>> [1] "1" "1" "1" "10" >>>> >>>> >>> >>> >> >> Steve; >> >> jim holtman is one of the jewels of the rhelp world. I generally assume that >> his answers are going to be the most succinct and efficient ones possible >> and avoid adding noise, but here I thought I would try to improve. Thinking >> there might be a string-splitting approach I first tried (and discovered a >> not-so-great solution: >> >> x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1", >> "OYS-PIA2-LA-10") >> sapply( strsplit(x, "-") , "[", 4) >> [1] "1" "1" "BB" "10" >> >> So then I asked myself if we could just "blank out" everything before the >> last das, finding what seemed to be a fairly economical solution and one >> that does not require back-references: >> >> sub( "^.+-" , "", x) >> >> [1] "1" "1" "1" "10" >> >> If there were no digits after the last dash these approaches give different >> results: >> >> x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1", >> "OYS-PIA2-LA-") >> >> sub( "^.+-" , "", x) >> >> [1] "1" "1" "1" "" >> >> sub("^.*?([0-9]+)$", "\\1", x) >> [1] "1" "1" "1" "OYS-PIA2-LA-" >> >> When a grep pattern does not match, sub and gsub will return the whole >> argument. >> >> -- >> David. >> >>> >>> On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ran...@gmail.com> >>> wrote: >>>> >>>> OYS-PIA2-FL-1 >>>> OYS-PIA2-LA-1 >>>> OYS-PI-LA-BB-1 >>>> OYS-PIA2-LA-10 >>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >>> Tell me what you want to do, not how you want to do it. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> David Winsemius, MD >> Alameda, CA, USA >> David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.