A dot is treated differently if it has a number on no, one, or both sides.

> stri_extract_all_words("me.com", simplify = TRUE)
     [,1]
[1,] "me.com"
> stri_extract_all_words("me1.com", simplify = TRUE)
     [,1]  [,2]
[1,] "me1" "com"
> stri_extract_all_words("me1.2com", simplify = TRUE)
     [,1]
[1,] "me1.2com"

?stri_extract_all_words

sent me to

?"stringi-search-boundaries"

which suggests that you should spend some time with the user guide:

     _Boundary Analysis_ - ICU User Guide, <URL:
     http://userguide.icu-project.org/boundaryanalysis>


Depending on your objective, you might be better off with strsplit()
separating on whitespace.

Sarah

On Wed, Nov 30, 2016 at 3:51 PM, Dimitri Liakhovitski
<dimitri.liakhovit...@gmail.com> wrote:
> Hello!
>
> library(stringi)
>
> stri_extract_all_words("me.com", simplify = TRUE)         # returns with a dot
> stri_extract_all_words("watch32.com", simplify = TRUE)  # removes the dot
>
> Why is the dot removed only in the second case?
> How is it possible to ask it NOT to remove the dot in the second case?
>
> Thanks a lot!
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to