Thanks to to Ista Zahn, I was able to find a work around solution. The key seems to be that string1 needs to be encoded as UTF-8 prior to being passed to gsub. For whatever reason,
Encoding(string1) <- "UTF-8" does not change the encoding on my Windows machine. The work around: I paste an obvious UTF-8 character "\u00A0" to the start of the string, send the string through gsub, then remove the "\u00A0" character from the output. string1 <- "\u00A0text X"; string1 Encoding(string1) new_string1 <- gsub("X","\u2265",string1); new_string1 new_string2 <- substring(new_string1,2); new_string2 If you know of a less hackish way to accomplish this, I'm interested to hear it. However, this work around is sufficient for now. Thanks, -tgs On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart <tgs.public.m...@gmail.com> wrote: > Can anyone help me understand the following behavior? > > I want to replace the letter 'X' in > âthe string â > 'text X' with 'â¥' (\u226 > â5 > ). The output from gsub is not what I expect. It gives: "text ââ°Â¥". > > Now, suppose I want to replace the character 'â¤' in > â the stringâ > 'text â¤' with 'â¥'. Then, gsub gives the expected, desired output. > > âWhat am I missing? > > Thanks for any insight. > -tgs > > Minimal Working Example: > > string1 <- "text X"; string1 > new_string1 <- gsub("X","\u2265",string1); new_string1 > > string2 <- "text \u2264"; string2 > new_string2 <- gsub("\u2264","\u2265",string2); new_string2 > > charToRaw(new_string1) > charToRaw(new_string2) > > sessionInfo() > > ## OUTPUT > > > string1 <- "text X"; string1 > [1] "text X" > > > new_string1 <- gsub("X","\u2265",string1); new_string1 > [1] "text ââ°Â¥" > > > string2 <- "text \u2264"; string2 > [1] "text â¤" > > > new_string2 <- gsub("\u2264","\u2265",string2); new_string2 > [1] "text â¥" > > > charToRaw(new_string1) > [1] 74 65 78 74 20 e2 89 a5 > > > charToRaw(new_string2) > [1] 74 65 78 74 20 e2 89 a5 > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.