Hi Duncan, Thanks a lot for your quick reply pointing out the Re-encoding section that I missed!
Before trying out R's C-level interface to the iconv's encoding conversion capabilities, I did some quick tests with Encoding() and iconv() on Windows with Rgui and Rterm. After Encoding(), non-ASCII characters are fine with Rgui but still wrong with Rterm. After iconv(), non-ASCII characters are still misprinted no matter if it is Rgui or Rterm. Here is the code that I used: (neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e))) (neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex)) Encoding(neg_inf_utf8) Encoding(neg_inf_utf8) <- "UTF-8" Encoding(neg_inf_utf8) neg_inf_utf8 charToRaw(neg_inf_utf8) iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE) iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE) Here is what I got with Rgui: > (neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e))) [1] 2d e2 88 9e > (neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex)) [1] "-∞" > Encoding(neg_inf_utf8) [1] "unknown" > > Encoding(neg_inf_utf8) <- "UTF-8" > Encoding(neg_inf_utf8) [1] "UTF-8" > neg_inf_utf8 [1] "-∞" > > charToRaw(neg_inf_utf8) [1] 2d e2 88 9e > iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE) [1] "-8" > iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE) [[1]] [1] 2d 38 > Here is what I got with Rterm: > (neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e))) [1] 2d e2 88 9e > (neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex)) [1] "-â^z" > Encoding(neg_inf_utf8) [1] "unknown" > > Encoding(neg_inf_utf8) <- "UTF-8" > Encoding(neg_inf_utf8) [1] "UTF-8" > neg_inf_utf8 [1] "-8" > > charToRaw(neg_inf_utf8) [1] 2d e2 88 9e > iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE) [1] "-8" > iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE) [[1]] [1] 2d 38 > Here is the sessionInfo: > sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 14393) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base > Am I missing something obvious? Thanks a lot for your help and your time! Michael On Mon, Sep 5, 2016 at 3:31 AM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 05/09/2016 12:40 AM, Lixin Gong wrote: > >> Dear R experts, >> >> It seems that Rprintf has to be used to print from a C routine to >> guarantee >> to write to R’s output according to >> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Printing. >> >> However if a string is UTF-8 encoded, non-ASCII characters (e.g., the >> infinity symbol http://www.fileformat.info/inf >> o/unicode/char/221e/index.htm) >> are misprinted. >> Is this an unsupported feature or is there a workaround for this >> limitation? >> > > If you are working in a UTF-8 locale (as on most Unix-like systems), you > should be fine. If not (as is normal on Windows), you'll need to translate > the string to the local encoding. The Writing R Extensions manual section > 6.11 tells you how to do the re-encoding. > > Duncan Murdoch > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel