Dear developers,

I am not really sure what causes the difference in the encoding of Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in R-2.10.0beta (now R-2.10.0rc), but I suppose that the different behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning regular expressions (RweaveLatexRuncode uses sub() in some places) as documented in NEWS. AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and a (conditional) back-conversion after the sub()-commands seems to resolve the encoding problems (as well as the different behaviour of Rgui and Rterm in R-2.10.0rc).

It would be great if someone more involved in Sweave could take a look at (and maybe commit) the attached (untested!) patch (to r50160). Many thanks in advance!
Best wishes,

 Martin


Martin Becker wrote:
Dear developers,

I have come across a (somewhat strange) change in the encoding of Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to Rgui) on Windows installations. Of course, the NEWS file contains quite a few changes concerning encoding, but I was not able to locate an entry which explains the observed behaviour. I am not very familiar with encodings/locales/codepages, but I will try to explain my observations as best I can.

In R-2.9.2pat, when invoking R via Rgui --vanilla (output of seesionInfo() below), the output of Sweave for .rnw files containing german umlaute (latin1-encoded) is again latin1-encoded (the resulting .tex-file compiles with \usepackage[latin1]{inputenc} and \usepackage[german]{babel}). In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output of seesionInfo() below), some of Sweave's output (more precisely, Soutput environments containing german umlaute, Sinput environments with german umlaute are still latin1) is utf-8 encoded (with some extra characters at the start and the end, which could be BOMs). Surprisingly, when R is invoked from (Windows) command line (R --vanilla or Rterm --vanilla), the encoding is completely latin1 again (as in R-2.9.2pat). So, the change to utf-8 encoding for parts of Sweave's output seems to be specific to Rgui.

Of course, I can work around this problem by using Rterm instead of Rgui when Sweav'ing, but I am not sure if the current behaviour of R via Rgui is as intended. I will try to attach the .rnw - file as well as the resulting .tex - files (and hope, that the attachements pass through).

Best wishes,

  Martin



sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
R version 2.9.2 Patched (2009-09-24 r50041)
i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
R version 2.10.0 beta (2009-10-11 r50037)
i386-pc-mingw32

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
------------------------------------------------------------------------

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany

diff -u --recursive R-rc/src/library/utils/R/Sweave.R 
R-rc-patched/src/library/utils/R/Sweave.R
--- R-rc/src/library/utils/R/Sweave.R   2009-09-28 00:05:26.000000000 +0200
+++ R-rc-patched/src/library/utils/R/Sweave.R   2009-10-19 13:24:17.000000000 
+0200
@@ -541,6 +541,7 @@
                         output <- sub("\n[[:space:]]*$", "", output)
                         if(options$strip.white=="all")
                           output <- sub("\n[[:space:]]*\n", "\n", output)
+                       if(Encoding(output)=="UTF-8") output <- iconv(output, 
from="utf-8")  
                     }
                     cat(output, file=chunkout, append=TRUE)
                     count <- sum(strsplit(output, NULL)[[1L]] == "\n")
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to