Dear Brian, > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Prof Brian Ripley > Sent: September-08-08 5:46 AM > To: John Fox > Cc: 'Jaro.Lajovic'; 'R-devel' > Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package > > Unless Windows is running in CP1250 (the Slovenian encoding on Windows), this > is not expected to work. I believe John tested in CP1252, and it just so > happens that those characters are in the same place in CP1250 and CP1252.
Yes, that's right: My locale is English (Canada), which uses CP1252. > > I get something different in CP1250, as pasting into the script window also > does not work. But if I use the Unicode escapes, the result in the output > Window is rendered correctly in the output window. > > I think Jaro has put his finger on this: Tcl/Tk output thinks it is in > Latin-2 and not CP1250, and s and z caron have different positions in those > two character sets. Here is something I can reproduce easily: with XP set to > Slovenian: > > > x <-"ČŠŽčšž" > > x > [1] "ČŠŽčšž" > > charToRaw(x) > [1] c8 8a 8e e8 9a 9e > > which is correct for CP1250. Now if I submit 'x' in the Rcmdr script window, > I get the wrong output in the output window. > > And I've tracked that down to a bug in iconv (something we take from libiconv > on Windows): it does think the native encoding is Latin-2, not CP1252. I'll > put a workaround in R-devel and R-patched shortly. That has other potential > ramifications that will take me longer to investigate, and correct thing may > be to fix iconv. Thank you very much for tracking this down. Recall that there is also apparently a problem under Mac OS X, where the characters appeared incorrectly in both the Script and Output windows. Regards, John > > On Sun, 7 Sep 2008, John Fox wrote: > > > Dear Brian, > > > > Thank you for addressing the problem -- I was hoping that you would. > > > >> -----Original Message----- > >> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > >> Sent: September-07-08 7:23 AM > >> To: John Fox > >> Cc: 'R-devel'; 'Jaro.Lajovic' > >> Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package > >> > >> The issue appears to be the Rcmdr output window and menus. They are > >> done using Tcl/Tk, not by R. So this might be a problem in Tcl/Tk or > >> the fonts it uses, or it might be problem with what Rcmdr passes to > >> the tcltk package. > >> > >> We need the means to reproduce this (as per the posting guide): > > > > Jaro provides an example in one of his messages in my posting (though > > it is slightly in error): If one enters > > > > cat("ČŠŽčšž\n") > > > > in the Rcmdr Script window, the characters are rendered correctly. > > Executing this command (via the Submit button) produces the following > > in the Output > > window: > > > >> cat("??????\n") > > ?????? > > > > which actually appears as > > > >> cat("??\n") > > ?? > > > > This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0. > > > >> > >> - what OSes are affected? Does this occur in a UTF-8 locale on > >> Linux, for example? > > > > I've now checked under Mac OS X and Linux Ubuntu, with the following > > results: > > > > Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4 > > > > cat("ČŠŽčšž\n") appears as cat("?????\n") in *both* the Script window > > and the Output window. > > > > Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5 > > > > cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and > > the Output window. > > > >> > >> - in what locales? > > > > I'm afraid that I don't know how to check this short of changing the > > locale for my Windows machine. I do observe the problem in Windows > > when I start Rgui with language=sl. > > > >> > >> - what versions of Tcl/Tk? Note that shipped with Windows R changed > >> between 2.5.1 and 2.7.x. > > > > Yes, and please see above, but if the problem were with Tcl/Tk, why > > does this work in the Script window under Windows and in both Script > > and Output under Ubuntu? > > > >> > >> - Is this anything to do with translations? I've not looked at how > >> translations are done in Rcmdr, but if gettext() is used, the string > >> passed to R for output is in the native encoding, so 'UTF-8 > >> characters' is incorrect. It is possible that it is an iconv problem > >> if the translations are supplied in UTF-8 and not Latin-2. > > > > Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by > > using > > Latin-2 in preference to UTF-8? > > > >> > >> There are far too many layers involved here to guess at what is going on. > >> My guess is that it ought to be possible to give a simple example of > >> a string which can be output to the Rcmdr console and will be > >> rendered incorrectly (together with a screen shot of how it is rendered). > > > > Indeed, please see above. I've also attached a screenshot under > > Windows, having started R with language=sl. > > > >> > >> I think the characters referred to are the Unicode glyphs 's and z > >> with caron', \u0161 and \u017E. It seems that these will only be > >> displayable in Rcmdr on Windows in a Latin-2 locale, which I do not > >> have set up on Windows (but believe I could get installed). However, > >> examples using that (and the menus) seem to be correct in both > >> sl_SI.iso88592 and sl_SI.utf8 on Linux, which suggests that this is > >> probably not an R issue but a Tcl/Tk one. > > > > I'm above my depth with respect to these issues, but I do find it > > curious that under Windows the characters appears correctly in the > > Script window but not the Output window. > > > >> > >> On Fri, 5 Sep 2008, John Fox wrote: > >> > >>> Dear list members, > >>> > >>> I've attached some email correspondence with Jaro Lajovic (with his > >>> permission), detailing a problem with the Slovenian translation file > >>> for the Rcmdr package. > >> > >> Unfortunately, it is not 'detailed', and we do need the details. > > > > I hope that the additional information in this message will supply at > > least some of the necessary details. > > > > Thank you for your help, > > John > > > >> > >>> In brief, while certain UTF-8 characters used in Slovenian used to > >>> appear properly in older versions of R, some characters do not > >>> display properly in the Rcmdr menus and output window under R 2.7.x. > >>> I've confirmed the problem with the current version of the Rcmdr > >>> package > >>> (1.4-0) and R 2.7.2 under Windows Vista. > >>> > >>> I've checked the R docs and NEWS file for changes to R, but wasn't > >>> able to turn up anything that seemed relevant. Frankly, however, my > >>> understanding of how various character sets are handled is only partial. > >>> > >>> Any help would be appreciated. > >>> > >>> John > >>> > >>> ------------------------------ > >>> John Fox, Professor > >>> Department of Sociology > >>> McMaster University > >>> Hamilton, Ontario, Canada > >>> web: socserv.mcmaster.ca/jfox > >>> > >>> > >>> -----Original Message----- > >>> From: Jaro.Lajovic [mailto:[EMAIL PROTECTED] > >>> Sent: August-26-08 2:57 AM > >>> To: John Fox > >>> Subject: Re: Slovenian Rcmdr .po and .mo - and a problem > >>> > >>> Dear John, > >>> > >>>> That seems to imply that there's a change in R rather than in the > >>>> Rcmdr that produced this problem. Do you notice the problem with > >>>> any other packages that use translation or with R itself? > >>> > >>> As for other translated R packages, I am afraid I am not aware of any. > >>> However, a quick test using cat with special characters: > >>> cat "ČŠŽčšž\n" > >>> reveals that the string prints OK in the R (2.7.1.) console. The > >>> command line also shows OK in the Rcmdr Script window, but does not > >>> display right in the Output window. Special chars also fail in the > >>> Messages > > window. > >>> > >>> Input (Script window) thus seems not to be affected, while the menu > >>> system and output do not work properly. > >>> > >>> Thank you very much, > >>> Jaro > >>> > >>> > >>>> On Mon, 25 Aug 2008 21:54:43 +0200 > >>>> "Jaro.Lajovic" <[EMAIL PROTECTED]> wrote: > >>>>> Dear John, > >>>>> > >>>>>> One question though: I assume from your message that the previous > >>>>>> version of the Rcmdr worked OK with R 2.7.1. Is that right? > >>>>> No, the version 1.3-5 (that I still have with R 2.5.1) does not > >>>>> work with R 2.7.1 either. So: > >>>>> > >>>>> Rcmdr 1.3-5 with R 2.5.1: works OK. > >>>>> Rcmdr 1.3-5 with R 2.7.1: does not work properly. > >>>>> Rcmdr 1.4-0 with R 2.7.1: does not work properly. > >>>>> > >>>>> Thank you in advance, > >>>>> Jaro > >>>>> > >>>>> > >>>>> > >>>>>> On Mon, 25 Aug 2008 18:52:32 +0200 "Jaro.Lajovic" > >>>>>> <[EMAIL PROTECTED]> wrote: > >>>>>>> Dear John, > >>>>>>> > >>>>>>> Please find attached zipped Slovenian versions of .po (plain > >>>>>>> text > >>>>> and > >>>>>>> UTF-8 coded text) and .mo files. > >>>>>>> > >>>>>>> However, there seems to be a problem I have not been able to > >>>>> resolve. > >>>>>>> While special characters display properly under R version 2.5.1 > >>>>> with > >>>>>>> Rcmdr 1.3-5, they fail to display (= are substituted by black > >>>>> blocks) > >>>>>>> under R version 2.7.1 with the new Rcmdr 1.4-0. By the way: the > >>>>> .mo > >>>>>>> file of the ver. 1.3-5 copied to 1.4-0 also failed to display > >>>>>>> properly. > >>>>>>> > >>>>>>> (An additional detail: three special characters that are used in > >>>>> the > >>>>>>> Slo version are c, s and z with hacek. c with hacek is not > >>>>> affected, > >>>>>>> it is just s and z with hacek that are not displayed OK.) > >>>>>>> > >>>>>>> Your advice will be much appreciated. > >>>>>>> > >>>>>>> With best regards, > >>>>>>> Jaro > >>>> > >>>> -------------------------------- > >>>> John Fox, Professor > >>>> Department of Sociology > >>>> McMaster University > >>>> Hamilton, Ontario, Canada > >>>> http://socserv.mcmaster.ca/jfox/ > >>>> > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > >> -- > >> Brian D. Ripley, [EMAIL PROTECTED] > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >> University of Oxford, Tel: +44 1865 272861 (self) > >> 1 South Parks Road, +44 1865 272866 (PA) > >> Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel