Hello list members,

I am looking for an implementation of Unicode text segmentation (word boundary 
detection) algorithms in R. You can find information about the algorithms here: 
http://www.unicode.org/reports/tr29/#Word_Boundaries

The help page for the function ‚casefuns‘ from the excellent ‚Unicode‘ package 
says: "Other methods will be added eventually (once the Unicode text 
segmentation algorithm is implemented for detecting word boundaries).“ My 
simple question is: Are these algorithms already implemented in an R package? I 
didn’t find anything on the web, but I am counting on the power of this list. 
My Stata-using colleague is already picking at me… (in Stata, the function 
’ustrword’ does exactly what I want to do in R).

Thanks for your help, have a good day, you all!
Sascha W.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to