On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jan...@yahoo.de> wrote: > Dear list memebers, > > > I am stuck with using regular expressions. > > > Imagine I have a vector of character strings like: > > test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > > How could I use regexpressions to extract only the 'def'/'abc' parts of these > strings? > > > Some try from my side yielded no results: > > testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, > value = TRUE) > > Somehow I seem to miss some important concept here. Until now I always used > nested sub expressions like: > > testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > > but this tends to become cumbersome and I was wondering whether there is a > more elegant way to do this? >
Here are a couple of solutions: # remove everything up to _b as well as everything from . onwards gsub(".*_|[.].*", "", test) # extract everything that is not a _ provided it is immediately followed by . library(gsubfn) strapply(test, "([^_]+)[.]", simplify = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.