Re: [R] Split String in regex while Keeping Delimiter

2023-04-13 Thread Leonard Mada via R-help
Dear Emily, I have written a more robust version of the function: extract.nonLetters = function(x, rm.space = TRUE, normalize=TRUE, sort=TRUE) {     if(normalize) str = stringi::stri_trans_nfc(str);     ch = strsplit(str, "", fixed = TRUE);     ch = unique(unlist(ch));     if(sort) ch = sort(ch

Re: [R] Split String in regex while Keeping Delimiter

2023-04-13 Thread Leonard Mada via R-help
Dear Emily, Using a look-behind solves the split problem in this case. (Note: Using Regex is in most/many cases the simplest solution.) str = c("leucocyten + gramnegatieve staven +++ grampositieve staven ++", "leucocyten – grampositieve coccen +") tokens = strsplit(str, "(?<=[-+])\\s++", perl

Re: [R] Split String in regex while Keeping Delimiter

2023-04-13 Thread Greg Snow
Since any space that follows 2 or 3 + signs (or - signs) also follows a single + (or -), this can be done with positive look behind, which may be a little simpler: x <- c( 'leucocyten + gramnegatieve staven +++ grampositieve staven ++', 'leucocyten - grampositieve coccen +' ) strsplit(x, "(?<=

Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread Bert Gunter
I always find regex puzzles amusing, so after changing the unicode typo quotes and dashes to ascii, the following simple prescription, similar to those proffered by others, seems to produce what you requested with your example: x <- c("leucocyten + gramnegatieve staven +++ grampositieve staven ++"

Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread avi.e.gross
rg Subject: Re: [R] Split String in regex while Keeping Delimiter I thought replacing the spaces following instances of +++,++,+,- with "\n" and then reading with scan should succeed. Like Ivan Krylov I was fairly sure that you meant the minus sign to be "-" rather than "

Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread David Winsemius
I thought replacing the spaces following instances of +++,++,+,- with "\n" and then reading with scan should succeed. Like Ivan Krylov I was fairly sure that you meant the minus sign to be "-" rather than "–", but perhaps your were using MS Word as an editor which is inconsistent with effective

Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread Ivan Krylov
On Wed, 12 Apr 2023 08:29:50 + Emily Bakker wrote: > Some example data: > “leucocyten + gramnegatieve staven +++ grampositieve staven ++” > “leucocyten – grampositieve coccen +” >   > I want to split the strings such that I get the following result: > c(“leucocyten +”,  “gramnegatieve staven

Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread Eric Berger
This seems to do the job but there are probably more elegant solutions: f <- function(s) { sub("^ ","",unlist(strsplit(gsub("\\+ ","+@ ",s),"@"))) } g <- function(s) { sub("^ ","",unlist(strsplit(gsub("- ","-@ ",s),"@"))) } h <- function(s) { g(f(s)) } To try it out: s <- “leucocyten + gramnegati

[R] Split String in regex while Keeping Delimiter

2023-04-12 Thread Emily Bakker
Hello List,   I have a dataset consisting of strings that I want to split while saving the delimiter.   Some example data: “leucocyten + gramnegatieve staven +++ grampositieve staven ++” “leucocyten – grampositieve coccen +”   I want to split the strings such that I get the following result: c(“le