Unless Windows is running in CP1250 (the Slovenian encoding on Windows), this is not expected to work. I believe John tested in CP1252, and it just so happens that those characters are in the same place in CP1250 and CP1252.

I get something different in CP1250, as pasting into the script window also does not work. But if I use the Unicode escapes, the result in the output Window is rendered correctly in the output window.

I think Jaro has put his finger on this: Tcl/Tk output thinks it is in Latin-2 and not CP1250, and s and z caron have different positions in those two character sets. Here is something I can reproduce easily: with XP set to Slovenian:

x <-"ČŠŽčšž"
x
[1] "ČŠŽčšž"
charToRaw(x)
[1] c8 8a 8e e8 9a 9e

which is correct for CP1250. Now if I submit 'x' in the Rcmdr script window, I get the wrong output in the output window.

And I've tracked that down to a bug in iconv (something we take from libiconv on Windows): it does think the native encoding is Latin-2, not CP1252. I'll put a workaround in R-devel and R-patched shortly. That has other potential ramifications that will take me longer to investigate, and correct thing may be to fix iconv.

On Sun, 7 Sep 2008, John Fox wrote:

Dear Brian,

Thank you for addressing the problem -- I was hoping that you would.

-----Original Message-----
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: September-07-08 7:23 AM
To: John Fox
Cc: 'R-devel'; 'Jaro.Lajovic'
Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package

The issue appears to be the Rcmdr output window and menus.  They are done
using Tcl/Tk, not by R.  So this might be a problem in Tcl/Tk or the fonts
it uses, or it might be problem with what Rcmdr passes to the tcltk
package.

We need the means to reproduce this (as per the posting guide):

Jaro provides an example in one of his messages in my posting (though it is
slightly in error): If one enters

cat("ČŠŽčšž\n")

in the Rcmdr Script window, the characters are rendered correctly. Executing
this command (via the Submit button) produces the following in the Output
window:

cat("??????\n")
??????

which actually appears as

cat("??\n")
??

This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0.


- what OSes are affected?  Does this occur in a UTF-8 locale on Linux, for
example?

I've now checked under Mac OS X and Linux Ubuntu, with the following
results:

Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4

cat("ČŠŽčšž\n") appears as cat("?????\n") in *both* the Script window and
the Output window.

Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5

cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and the
Output window.


- in what locales?

I'm afraid that I don't know how to check this short of changing the locale
for my Windows machine. I do observe the problem in Windows when I start
Rgui with language=sl.


- what versions of Tcl/Tk?  Note that shipped with Windows R
changed between 2.5.1 and 2.7.x.

Yes, and please see above, but if the problem were with Tcl/Tk, why does
this work in the Script window under Windows and in both Script and Output
under Ubuntu?


- Is this anything to do with translations?  I've not looked at how
translations are done in Rcmdr, but if gettext() is used, the string
passed to R for output is in the native encoding, so 'UTF-8 characters' is
incorrect.  It is possible that it is an iconv problem if the translations
are supplied in UTF-8 and not Latin-2.

Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by using
Latin-2 in preference to UTF-8?


There are far too many layers involved here to guess at what is going on.
My guess is that it ought to be possible to give a simple example of a
string which can be output to the Rcmdr console and will be rendered
incorrectly (together with a screen shot of how it is rendered).

Indeed, please see above. I've also attached a screenshot under Windows,
having started R with language=sl.


I think the characters referred to are the Unicode glyphs 's and z with
caron', \u0161 and \u017E.  It seems that these will only be displayable
in Rcmdr on Windows in a Latin-2 locale, which I do not have set up on
Windows (but believe I could get installed).  However, examples using that
(and the menus) seem to be correct in both sl_SI.iso88592 and sl_SI.utf8
on Linux, which suggests that this is probably not an R issue but a Tcl/Tk
one.

I'm above my depth with respect to these issues, but I do find it curious
that under Windows the characters appears correctly in the Script window but
not the Output window.


On Fri, 5 Sep 2008, John Fox wrote:

Dear list members,

I've attached some email correspondence with Jaro Lajovic (with his
permission), detailing a problem with the Slovenian translation file for
the Rcmdr package.

Unfortunately, it is not 'detailed', and we do need the details.

I hope that the additional information in this message will supply at least
some of the necessary details.

Thank you for your help,
John


In brief, while certain UTF-8 characters used in Slovenian used to
appear properly in older versions of R, some characters do not display
properly in the Rcmdr menus and output window under R 2.7.x. I've
confirmed the problem with the current version of the Rcmdr package
(1.4-0) and R 2.7.2 under Windows Vista.

I've checked the R docs and NEWS file for changes to R, but wasn't able
to turn up anything that seemed relevant. Frankly, however, my
understanding of how various character sets are handled is only partial.

Any help would be appreciated.

John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


-----Original Message-----
From: Jaro.Lajovic [mailto:[EMAIL PROTECTED]
Sent: August-26-08 2:57 AM
To: John Fox
Subject: Re: Slovenian Rcmdr .po and .mo - and a problem

Dear John,

That seems to imply that there's a change in R rather than in the Rcmdr
that produced this problem. Do you notice the problem with any other
packages that use translation or with R itself?

As for other translated R packages, I am afraid I am not aware of any.
However, a quick test using cat with special characters:
cat "ČŠŽčšž\n"
reveals that the string prints OK in the R (2.7.1.) console. The command
line also shows OK in the Rcmdr Script window, but does not display
right in the Output window. Special chars also fail in the Messages
window.

Input (Script window) thus seems not to be affected, while the menu
system and output do not work properly.

Thank you very much,
Jaro


On Mon, 25 Aug 2008 21:54:43 +0200
 "Jaro.Lajovic" <[EMAIL PROTECTED]> wrote:
Dear John,

One question though: I assume from your message that the previous
version of the Rcmdr worked OK with R 2.7.1. Is that right?
No, the version 1.3-5 (that I still have with R 2.5.1) does not work
with R 2.7.1 either. So:

Rcmdr 1.3-5 with R 2.5.1: works OK.
Rcmdr 1.3-5 with R 2.7.1: does not work properly.
Rcmdr 1.4-0 with R 2.7.1: does not work properly.

Thank you in advance,
Jaro



On Mon, 25 Aug 2008 18:52:32 +0200
 "Jaro.Lajovic" <[EMAIL PROTECTED]> wrote:
Dear John,

Please find attached zipped Slovenian versions of .po (plain text
and
UTF-8 coded text) and .mo files.

However, there seems to be a problem I have not been able to
resolve.
While special characters display properly under R version 2.5.1
with
Rcmdr 1.3-5, they fail to display (= are substituted by black
blocks)
under R version 2.7.1 with the new Rcmdr 1.4-0. By the way: the
.mo
file of the ver. 1.3-5 copied to 1.4-0 also failed to display
properly.

(An additional detail: three special characters that are used in
the
Slo version are c, s and z with hacek. c with hacek is not
affected,
it is just s and z with hacek that are not displayed OK.)

Your advice will be much appreciated.

With best regards,
Jaro

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to