On Wed, 12 Apr 2023 08:29:50 +0000 Emily Bakker <emilybak...@outlook.com> wrote:
> Some example data: > “leucocyten + gramnegatieve staven +++ grampositieve staven ++” > “leucocyten – grampositieve coccen +” > > I want to split the strings such that I get the following result: > c(“leucocyten +”, “gramnegatieve staven +++”, > “grampositieve staven ++”) > c(“leucocyten –“, “grampositieve coccen +”) > > I have tried strsplit with a regular expression with a positive > lookahead, but I am not able to achieve the results that I want. It sounds like you need positive look-behind, not look-ahead: split on spaces only if they _follow_ one to three of '+' or '-'. Unfortunately, repetition quantifiers like {n,m} or + are not directly supported in look-behind expressions (nor in Perl itself). As a special case, you can use \K, where anything to the left of \K is a zero-width positive match: x <- c( 'leucocyten + gramnegatieve staven +++ grampositieve staven ++', 'leucocyten - grampositieve coccen +' ) strsplit(x, '[+-]{1,3}+\\K ', perl = TRUE) # [[1]] # [1] "leucocyten +" "gramnegatieve staven +++" # "grampositieve staven ++" # # [[2]] # [1] "leucocyten -" "grampositieve coccen +" -- Best regards, Ivan P.S. It looks like your e-mail client has transformed every quote character into typographically-correct Unicode quotes “” and every minus into an en dash, which makes it slightly harder to work with your code, since typographically correct Unicode quotes are not R string delimiters. Is it really – that you'd like to split upon, or is it -? ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.