On Wed, Oct 5, 2011 at 7:56 AM, Jannis <bt_jan...@yahoo.de> wrote:
> Dear list memebers,
>
>
> I am stuck with using regular expressions.
>
>
> Imagine I have a vector of character strings like:
>
> test <- c('filename_1_def.pdf', 'filename_2_abc.pdf')
>
> How could I use regexpressions to extract only the 'def'/'abc' parts of these 
> strings?
>
>
> Some try from my side yielded no results:
>
> testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, 
> value = TRUE)
>
> Somehow I seem to miss some important concept here. Until now I always used 
> nested sub expressions like:
>
> testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test))
>
>
> but this tends to become cumbersome and I was wondering whether there is a 
> more elegant way to do this?
>

Here are a couple of solutions:

# remove everything up to _b as well as everything from . onwards
gsub(".*_|[.].*", "", test)

# extract everything that is not a _ provided it is immediately followed by .
library(gsubfn)
strapply(test, "([^_]+)[.]", simplify = TRUE)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to