On Jul 11, 2015, at 7:47 AM, Bert Gunter wrote:

> I noticed the following:
> 
>> strsplit("red green","\\b")
> [[1]]
> [1] "r" "e" "d" " " "g" "r" "e" "e" "n"

After reading the ?regex help page, I didn't understand why `\b` would split 
within sequences of "word"-characters, either. I expected this to be the result:

[[1]]
[1] "red"  " "  "green"

There is a warning in that paragraph: "(The interpretation of ‘word’ depends on 
the locale and implementation.)"

I got the expected result with only one of "\\>" and "\\<"

> strsplit("red green","\\<")
[[1]]
[1] "r" "e" "d" " " "g" "r" "e" "e" "n"

> strsplit("red green","\\>")
[[1]]
[1] "red"    " green"

The result with "\\<" seems decidedly unexpected.

I'm wondered if the "original" regex documentation uses the same language as 
the R help page. So I went to the cited website and find:
=======
An assertion-character can be any of the following:

        • < – Beginning of word
        • > – End of word
        • b – Word boundary
        • B – Non-word boundary
        • d – Digit character (equivalent to [[:digit:]])
        • D – Non-digit character (equivalent to [^[:digit:]])
        • s – Space character (equivalent to [[:space:]])
        • S – Non-space character (equivalent to [^[:space:]])
        • w – Word character (equivalent to [[:alnum:]_])
        • W – Non-word character (equivalent to [^[:alnum:]_])
========

The word-"word" appears nowhere else on that page.


>> strsplit("red green","\\W")
> [[1]]
> [1] "red"   "green"

`\W` matches the byte-width non-word characters. So the " "-character would be 
discarded.

> 
> I would have thought that "\\b" should give what "\\W" did. Note that:
> 
>> grep("\\bred\\b","red green")
> [1] 1
> ## as expected
> 
> Does strsplit use a different regex engine than grep()? Or more
> likely, what am I misunderstanding?
> 
> Thanks.
> 
> Bert
> 
> 


David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to