Re: [R] Split String in regex while Keeping Delimiter

Ivan Krylov Wed, 12 Apr 2023 10:54:25 -0700

On Wed, 12 Apr 2023 08:29:50 +0000
Emily Bakker <emilybak...@outlook.com> wrote:


> Some example data:
> “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
> “leucocyten – grampositieve coccen +”
>  
> I want to split the strings such that I get the following result:
> c(“leucocyten +”,  “gramnegatieve staven +++”,
>  “grampositieve staven ++”)
> c(“leucocyten –“, “grampositieve coccen +”)
>  
> I have tried strsplit with a regular expression with a positive
> lookahead, but I am not able to achieve the results that I want.

It sounds like you need positive look-behind, not look-ahead: split on
spaces only if they _follow_ one to three of '+' or '-'. Unfortunately,
repetition quantifiers like {n,m} or + are not directly supported in
look-behind expressions (nor in Perl itself). As a special case, you
can use \K, where anything to the left of \K is a zero-width positive
match:

x <- c(
 'leucocyten + gramnegatieve staven +++ grampositieve staven ++',
 'leucocyten - grampositieve coccen +'
)
strsplit(x, '[+-]{1,3}+\\K ', perl = TRUE)
# [[1]]
# [1] "leucocyten +"             "gramnegatieve staven +++"
#     "grampositieve staven ++" 
# 
# [[2]]
# [1] "leucocyten -"           "grampositieve coccen +"

-- 
Best regards,
Ivan

P.S. It looks like your e-mail client has transformed every quote
character into typographically-correct Unicode quotes “” and every
minus into an en dash, which makes it slightly harder to work with your
code, since typographically correct Unicode quotes are not R string
delimiters. Is it really – that you'd like to split upon, or is it -?

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split String in regex while Keeping Delimiter

Reply via email to