Dear developers,
I am not really sure what causes the difference in the encoding of
Sweave Soutput environments between Rgui.exe and R.exe/Rterm.exe in
R-2.10.0beta (now R-2.10.0rc), but I suppose that the different
behaviour of R-2.9.2pat and R-2.10.0rc is caused by changes concerning
regular expressions (RweaveLatexRuncode uses sub() in some places) as
documented in NEWS.
AFAICS, sub() now (R-2.10.0rc) possibly converts its input to UTF-8, and
a (conditional) back-conversion after the sub()-commands seems to
resolve the encoding problems (as well as the different behaviour of
Rgui and Rterm in R-2.10.0rc).
It would be great if someone more involved in Sweave could take a look
at (and maybe commit) the attached (untested!) patch (to r50160). Many
thanks in advance!
Best wishes,
Martin
Martin Becker wrote:
Dear developers,
I have come across a (somewhat strange) change in the encoding of
Sweave output from R-2.9.2pat to R-2.10.0beta (apparently specific to
Rgui) on Windows installations. Of course, the NEWS file contains
quite a few changes concerning encoding, but I was not able to locate
an entry which explains the observed behaviour. I am not very familiar
with encodings/locales/codepages, but I will try to explain my
observations as best I can.
In R-2.9.2pat, when invoking R via Rgui --vanilla (output of
seesionInfo() below), the output of Sweave for .rnw files containing
german umlaute (latin1-encoded) is again latin1-encoded (the resulting
.tex-file compiles with \usepackage[latin1]{inputenc} and
\usepackage[german]{babel}).
In R-2.10.0beta, however, when invoking R via Rgui --vanilla (output
of seesionInfo() below), some of Sweave's output (more precisely,
Soutput environments containing german umlaute, Sinput environments
with german umlaute are still latin1) is utf-8 encoded (with some
extra characters at the start and the end, which could be BOMs).
Surprisingly, when R is invoked from (Windows) command line (R
--vanilla or Rterm --vanilla), the encoding is completely latin1 again
(as in R-2.9.2pat). So, the change to utf-8 encoding for parts of
Sweave's output seems to be specific to Rgui.
Of course, I can work around this problem by using Rterm instead of
Rgui when Sweav'ing, but I am not sure if the current behaviour of R
via Rgui is as intended.
I will try to attach the .rnw - file as well as the resulting .tex -
files (and hope, that the attachements pass through).
Best wishes,
Martin
sessionInfo() for R-2.9.2pat (same for Rgui, R, Rterm):
R version 2.9.2 Patched (2009-09-24 r50041)
i386-pc-mingw32
locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
sessionInfo() for R-2.10.0beta (same for Rgui, R, Rterm):
R version 2.10.0 beta (2009-10-11 r50037)
i386-pc-mingw32
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3]
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5]
LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
------------------------------------------------------------------------
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Dr. Martin Becker
Statistics and Econometrics
Saarland University
Campus C3 1, Room 206
66123 Saarbruecken
Germany
diff -u --recursive R-rc/src/library/utils/R/Sweave.R
R-rc-patched/src/library/utils/R/Sweave.R
--- R-rc/src/library/utils/R/Sweave.R 2009-09-28 00:05:26.000000000 +0200
+++ R-rc-patched/src/library/utils/R/Sweave.R 2009-10-19 13:24:17.000000000
+0200
@@ -541,6 +541,7 @@
output <- sub("\n[[:space:]]*$", "", output)
if(options$strip.white=="all")
output <- sub("\n[[:space:]]*\n", "\n", output)
+ if(Encoding(output)=="UTF-8") output <- iconv(output,
from="utf-8")
}
cat(output, file=chunkout, append=TRUE)
count <- sum(strsplit(output, NULL)[[1L]] == "\n")
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel