Hello list members, I am looking for an implementation of Unicode text segmentation (word boundary detection) algorithms in R. You can find information about the algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries
The help page for the function ‚casefuns‘ from the excellent ‚Unicode‘ package says: "Other methods will be added eventually (once the Unicode text segmentation algorithm is implemented for detecting word boundaries).“ My simple question is: Are these algorithms already implemented in an R package? I didn’t find anything on the web, but I am counting on the power of this list. My Stata-using colleague is already picking at me… (in Stata, the function ’ustrword’ does exactly what I want to do in R). Thanks for your help, have a good day, you all! Sascha W.
signature.asc
Description: Message signed with OpenPGP using GPGMail
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.