This is just to you. You might want to read r-help through gmane.org or one of the other online readers or get a gmail or other online email account just for reading newsgroups to circumvent your thread-challenged email client.
Regards. On Tue, Jun 9, 2009 at 1:42 PM, Greg Snow<greg.s...@imail.org> wrote: > Yes, I already apologized to Wacek for missing that and pointing out what he > had already said. > > Given everything in this thread (though it is hard to keep track of all of > it, my e-mail client does not keep all the parts of the thread together), > this is probably one of those few tasks that R is not the best tool for. > There is a Perl module called Lingua::DE::Sentence with the description: > "Perl extension for tokenizing german texts into their sentences" which seems > to be exactly what the original poster was looking for. So the best option > may be to use Perl and the above module to preprocess his texts, then use R > for later steps. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> project.org] On Behalf Of Gabor Grothendieck >> Sent: Tuesday, June 09, 2009 11:27 AM >> To: Greg Snow >> Cc: Wacek Kusnierczyk; r-help@r-project.org; Mark Heckmann >> Subject: Re: [R] using regular expressions to retrieve a digit-digit- >> dot structure from a string >> >> Wacek already mentioned that; however, its still >> arguably more complex to specify delimiters >> than to specify content. Aside from having >> to specify perl = TRUE and ungreedy matching >> the content-based regexp is entirely straight forward >> but for lookbehind (including \K) one has the added >> complexity of distinguishing between matching and returned >> values. >> >> On Tue, Jun 9, 2009 at 12:36 PM, Greg Snow<greg.s...@imail.org> wrote: >> > You can sometimes fake variable width look behinds with Perl regexs >> using '\K': >> > >> >> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1. 11.', perl=TRUE) >> > [[1]] >> > [1] 5 13 >> > attr(,"match.length") >> > [1] 1 1 >> > >> > >> > -- >> > Gregory (Greg) L. Snow Ph.D. >> > Statistical Data Center >> > Intermountain Healthcare >> > greg.s...@imail.org >> > 801.408.8111 >> > >> > >> >> -----Original Message----- >> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- >> >> project.org] On Behalf Of Wacek Kusnierczyk >> >> Sent: Tuesday, June 09, 2009 1:05 AM >> >> To: Gabor Grothendieck >> >> Cc: r-help@r-project.org; Mark Heckmann >> >> Subject: Re: [R] using regular expressions to retrieve a digit- >> digit- >> >> dot structure from a string >> >> >> >> Gabor Grothendieck wrote: >> >> > On Mon, Jun 8, 2009 at 7:18 PM, Wacek >> >> > Kusnierczyk<waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: >> >> > >> >> >> Gabor Grothendieck wrote: >> >> >> >> >> >>> Try this. See ?regex for more. >> >> >>> >> >> >>> >> >> >>> >> >> >>>> x <- 'This happened in the 21. century." (the dot behind 21 is' >> >> >>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE) >> >> >>>> >> >> >>>> >> >> >>> [1] 24 >> >> >>> attr(,"match.length") >> >> >>> [1] 1 >> >> >>> >> >> >>> >> >> >> yes, but >> >> >> >> >> >> gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >> >> >> # 2 5 9 >> >> >> >> >> > >> >> > Yes, it should be: >> >> > >> >> > >> >> >> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRUE) >> >> >> >> >> > [[1]] >> >> > [1] 5 9 >> >> > attr(,"match.length") >> >> > [1] 1 1 >> >> > >> >> > which displays the position of every dot that is preceded >> >> > immediately by a digit. Or just replace gregexpr with regexpr >> >> > if its intended that it match only one. >> >> > >> >> >> >> i guess what was needed was something like >> >> >> >> gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE) >> >> # 5 >> >> >> >> which won't work, however, because pcre does not support variable- >> width >> >> lookbehinds. >> >> >> >> > >> >> >> which, i guess, is not what you want. if what you want is to >> match >> >> all >> >> >> and only dots that follow at least one digit preceded by a word >> >> >> boundary, then the following should do, as far as i can see: >> >> >> >> >> >> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1.', perl=TRUE) >> >> >> # 5 >> >> >> >> >> >> vQ >> >> >> >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posting- >> >> guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.