Re: [Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015-07-20 Thread suimong
Thank you Winston for the solution! The only workaround I come up with is to set options(encoding = "UTF-8"), which is generally undesirable. I'm wondering is there any chance this patch will be included in future R version? I have been running into this problem from time to time and the latest R

Re: [Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015-03-03 Thread Winston Chang
After a bit more investigation, I think I've found the cause of the bug, and I have a patch. This bug happens with grep(), when: * Running on Windows. * The search uses fixed=TRUE. * The search pattern is a single byte. * The current locale has a multibyte encoding. === Here's

[Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015-03-02 Thread Winston Chang
On Windows, grep(fixed=TRUE) throws errors with some UTF-8 strings. Here's an example (must be run on Windows to reproduce the error): Sys.setlocale("LC_CTYPE", "chinese") y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97))) Encoding(y) <- "UTF-8" y # [1] "渗" grep("\n", y, fixed = TRUE) # Error in grep("\n