Marc Schwartz wrote: > On Jun 9, 2009, at 6:44 AM, Mark Heckmann wrote: > >> Hey all, >> >> Thanks for your help. Your answers solved the problem I posted and >> that is >> just when I noticed that I misspecified the problem ;) >> My problem is to separate a German texts by sentences. Unfortunately I >> haven't found an R package doing this kind of text separation in >> German, so >> I try it "manually". >> >> Just using the dot as separator fails in occasions like: >> txt <- "One January 1. I saw Rick. He was born in the 19. century." >> >> Here I want the algorithm to separate the string only at the >> positions where >> the dot is not preceded by a digit. The R-snippets posted pick out >> "1." and >> "19." >> >> txt <- "One January 1. I saw Rick. He was born in the 19. century." >>> gregexpr('(?<=[0-9])[.]',txt, perl=T) >> [[1]] >> [1] 14 49 >> attr(,"match.length") >> [1] 1 1 >> >> But I just need it the other way round. So I tried: >> >>> strsplit(txt, "[[:alpha:]]\\." , perl=T) >> [[1]] >> [1] "One January 1. I saw Ric" " He was born in the 19. centur" >> >> But this erases the last letter from each sentence. Does someone know a >> solution?
try strsplit(txt, '(?<![0-9])[.]', perl=TRUE) vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.