Thanks to to Ista Zahn, I was able to find a work around solution.  The key
seems to be that string1 needs to be encoded as UTF-8 prior to being passed
to gsub.  For whatever reason,

Encoding(string1) <- "UTF-8"

does not change the encoding on my Windows machine.  The work around:  I
paste an obvious UTF-8 character "\u00A0" to the start of the string, send
the string through gsub, then remove the "\u00A0" character from the output.

string1 <- "\u00A0text X"; string1
Encoding(string1)
new_string1 <- gsub("X","\u2265",string1); new_string1
new_string2 <- substring(new_string1,2); new_string2

If you know of a less hackish way to accomplish this, I'm interested to
hear it.  However, this work around is sufficient for now.

Thanks,
-tgs


On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart <tgs.public.m...@gmail.com>
wrote:

> Can anyone help me understand the following behavior?
>
> I want to replace the letter 'X' in
> ​the string ​
> 'text X' with '≥' (\u226
> ​5
> ).  The output from gsub is not what I expect.  It gives: "text ≥".
>
> Now, suppose I want to replace the character '≤' in
> ​ the string​
> 'text ≤' with '≥'.  Then, gsub gives the expected, desired output.
>
> ​What am I missing?
>
> Thanks for any insight.
> -tgs
>
> Minimal Working Example:
>
> string1 <- "text X"; string1
> new_string1 <- gsub("X","\u2265",string1); new_string1
>
> string2 <- "text \u2264"; string2
> new_string2 <- gsub("\u2264","\u2265",string2); new_string2
>
> charToRaw(new_string1)
> charToRaw(new_string2)
>
> sessionInfo()
>
> ## OUTPUT
>
> > string1 <- "text X"; string1
> [1] "text X"
>
> > new_string1 <- gsub("X","\u2265",string1); new_string1
> [1] "text ≥"
>
> > string2 <- "text \u2264"; string2
> [1] "text ≤"
>
> > new_string2 <- gsub("\u2264","\u2265",string2); new_string2
> [1] "text ≥"
>
> > charToRaw(new_string1)
> [1] 74 65 78 74 20 e2 89 a5
>
> > charToRaw(new_string2)
> [1] 74 65 78 74 20 e2 89 a5
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to