\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character.
> strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " " "m" "a" How can that be? This is the output of 'gregexpr'. > gregexpr("\\b", "dia ma") [[1]] [1] 1 2 3 4 5 6 attr(,"match.length") [1] 0 0 0 0 0 0 > gregexpr("\\b", "dia ma", perl=TRUE) [[1]] [1] 1 4 5 7 attr(,"match.length") [1] 0 0 0 0 The output from gregexpr("\\b", "dia ma", perl=TRUE) is what I expect. I expect 'strsplit' to split at that points. This is in Windows. R was installed from binary. > sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base R 2.8.1 shows the same 'strsplit' behavior, but the behavior of default 'gregexpr' (i.e. perl=FALSE) is different. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " " "m" "a" > gregexpr("\\b", "dia ma") [[1]] [1] 1 4 5 7 attr(,"match.length") [1] 0 0 0 0 > gregexpr("\\b", "dia ma", perl=TRUE) [[1]] [1] 1 4 5 7 attr(,"match.length") [1] 0 0 0 0 > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MON ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.