Re: [Rd] Problem with UTF-8 text in the Rcmdr package
Dear John, >> - Is this anything to do with translations? I've not looked at how >> translations are done in Rcmdr, but if gettext() is used, the string >> passed to R for output is in the native encoding, so 'UTF-8 >> characters' is incorrect. It is possible that it is an iconv problem >> if the translations are supplied in UTF-8 and not Latin-2. > > Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by > using Latin-2 in preference to UTF-8? As mentioned, I am testing this under Windows XP (R 2.7.1). Preparing the .mo file with the Latin-2 encoding (or Win-1250, for that matter) does not make any difference. However, with the help of my son I have made a test, documented in the attached screenshot. It seems that the output routines expect Latin-2, but (as for the translation) get the native encoding. Best regards, Jaro John Fox pravi: Dear Brian, Thank you for addressing the problem -- I was hoping that you would. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: September-07-08 7:23 AM To: John Fox Cc: 'R-devel'; 'Jaro.Lajovic' Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package The issue appears to be the Rcmdr output window and menus. They are done using Tcl/Tk, not by R. So this might be a problem in Tcl/Tk or the fonts it uses, or it might be problem with what Rcmdr passes to the tcltk package. We need the means to reproduce this (as per the posting guide): Jaro provides an example in one of his messages in my posting (though it is slightly in error): If one enters cat("ČŠŽčšž\n") in the Rcmdr Script window, the characters are rendered correctly. Executing this command (via the Submit button) produces the following in the Output window: cat("??\n") ?? which actually appears as cat("??\n") ?? This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0. - what OSes are affected? Does this occur in a UTF-8 locale on Linux, for example? I've now checked under Mac OS X and Linux Ubuntu, with the following results: Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4 cat("ČŠŽčšž\n") appears as cat("?\n") in *both* the Script window and the Output window. Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5 cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and the Output window. - in what locales? I'm afraid that I don't know how to check this short of changing the locale for my Windows machine. I do observe the problem in Windows when I start Rgui with language=sl. - what versions of Tcl/Tk? Note that shipped with Windows R changed between 2.5.1 and 2.7.x. Yes, and please see above, but if the problem were with Tcl/Tk, why does this work in the Script window under Windows and in both Script and Output under Ubuntu? - Is this anything to do with translations? I've not looked at how translations are done in Rcmdr, but if gettext() is used, the string passed to R for output is in the native encoding, so 'UTF-8 characters' is incorrect. It is possible that it is an iconv problem if the translations are supplied in UTF-8 and not Latin-2. Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by using Latin-2 in preference to UTF-8? There are far too many layers involved here to guess at what is going on. My guess is that it ought to be possible to give a simple example of a string which can be output to the Rcmdr console and will be rendered incorrectly (together with a screen shot of how it is rendered). Indeed, please see above. I've also attached a screenshot under Windows, having started R with language=sl. I think the characters referred to are the Unicode glyphs 's and z with caron', \u0161 and \u017E. It seems that these will only be displayable in Rcmdr on Windows in a Latin-2 locale, which I do not have set up on Windows (but believe I could get installed). However, examples using that (and the menus) seem to be correct in both sl_SI.iso88592 and sl_SI.utf8 on Linux, which suggests that this is probably not an R issue but a Tcl/Tk one. I'm above my depth with respect to these issues, but I do find it curious that under Windows the characters appears correctly in the Script window but not the Output window. On Fri, 5 Sep 2008, John Fox wrote: Dear list members, I've attached some email correspondence with Jaro Lajovic (with his permission), detailing a problem with the Slovenian translation file for the Rcmdr package. Unfortunately, it is not 'detailed', and we do need the details. I hope that the additional information in this message will supply at least some of the necessary details. Thank you for your help, John In brief, while certain UTF-8 characters used in Slovenian used to appear properly in older versions of R, some characters do not display properly in the Rcmdr menus and output window under R 2.7.x. I've confirmed the problem with the current version of the Rcmdr package (1.4-0) and R 2.7.2 under Windo
[Rd] Different result with different order of binding (PR#12742)
Full_Name: Kyun-Seop Bae Version: 2.7.2 OS: MS-Windows XP SP2 Submission from: (NULL) (148.168.40.4) # Script that I used rm(list=objects()) objects() WT <- 91 AGE <- 41 SCR <- 1.3 CCL1 <- (140-AGE) * WT / (72 * SCR) CCL2 <- (140-AGE) * WT / 72 / SCR CCL1 CCL2 identical(CCL1, CCL2) identical(CCL1, 96.25) identical(CCL2, 96.25) CCL1*10 + 0.5 CCL2*10 + 0.5 floor(CCL1*10 + 0.5) floor(CCL2*10 + 0.5) as.integer(CCL1*10 + 0.5) as.integer(CCL2*10 + 0.5) # Same with multiplied WT # Same in S-Plus Enterprise Developer Version 7.0.6 for Microsoft Windows : 2005 # But these are accurate in MS-Excel. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Different result with different order of binding (PR#12742)
[EMAIL PROTECTED] wrote: > Full_Name: Kyun-Seop Bae > Version: 2.7.2 > OS: MS-Windows XP SP2 > Submission from: (NULL) (148.168.40.4) > > FAQ 7.31, not a bug! > # Script that I used > > rm(list=objects()) > objects() > > WT <- 91 > AGE <- 41 > SCR <- 1.3 > > CCL1 <- (140-AGE) * WT / (72 * SCR) > CCL2 <- (140-AGE) * WT / 72 / SCR > > CCL1 > CCL2 > > identical(CCL1, CCL2) > identical(CCL1, 96.25) > identical(CCL2, 96.25) > > CCL1*10 + 0.5 > CCL2*10 + 0.5 > > floor(CCL1*10 + 0.5) > floor(CCL2*10 + 0.5) > > as.integer(CCL1*10 + 0.5) > as.integer(CCL2*10 + 0.5) > > > # Same with multiplied WT > # Same in S-Plus Enterprise Developer Version 7.0.6 for Microsoft Windows : > 2005 > # But these are accurate in MS-Excel. > > Excel is not th gold standard for scientific computing!! > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem with UTF-8 text in the Rcmdr package
Unless Windows is running in CP1250 (the Slovenian encoding on Windows), this is not expected to work. I believe John tested in CP1252, and it just so happens that those characters are in the same place in CP1250 and CP1252. I get something different in CP1250, as pasting into the script window also does not work. But if I use the Unicode escapes, the result in the output Window is rendered correctly in the output window. I think Jaro has put his finger on this: Tcl/Tk output thinks it is in Latin-2 and not CP1250, and s and z caron have different positions in those two character sets. Here is something I can reproduce easily: with XP set to Slovenian: x <-"ČŠŽčšž" x [1] "ČŠŽčšž" charToRaw(x) [1] c8 8a 8e e8 9a 9e which is correct for CP1250. Now if I submit 'x' in the Rcmdr script window, I get the wrong output in the output window. And I've tracked that down to a bug in iconv (something we take from libiconv on Windows): it does think the native encoding is Latin-2, not CP1252. I'll put a workaround in R-devel and R-patched shortly. That has other potential ramifications that will take me longer to investigate, and correct thing may be to fix iconv. On Sun, 7 Sep 2008, John Fox wrote: Dear Brian, Thank you for addressing the problem -- I was hoping that you would. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: September-07-08 7:23 AM To: John Fox Cc: 'R-devel'; 'Jaro.Lajovic' Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package The issue appears to be the Rcmdr output window and menus. They are done using Tcl/Tk, not by R. So this might be a problem in Tcl/Tk or the fonts it uses, or it might be problem with what Rcmdr passes to the tcltk package. We need the means to reproduce this (as per the posting guide): Jaro provides an example in one of his messages in my posting (though it is slightly in error): If one enters cat("ČŠŽčšž\n") in the Rcmdr Script window, the characters are rendered correctly. Executing this command (via the Submit button) produces the following in the Output window: cat("??\n") ?? which actually appears as cat("??\n") ?? This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0. - what OSes are affected? Does this occur in a UTF-8 locale on Linux, for example? I've now checked under Mac OS X and Linux Ubuntu, with the following results: Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4 cat("ČŠŽčšž\n") appears as cat("?\n") in *both* the Script window and the Output window. Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5 cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and the Output window. - in what locales? I'm afraid that I don't know how to check this short of changing the locale for my Windows machine. I do observe the problem in Windows when I start Rgui with language=sl. - what versions of Tcl/Tk? Note that shipped with Windows R changed between 2.5.1 and 2.7.x. Yes, and please see above, but if the problem were with Tcl/Tk, why does this work in the Script window under Windows and in both Script and Output under Ubuntu? - Is this anything to do with translations? I've not looked at how translations are done in Rcmdr, but if gettext() is used, the string passed to R for output is in the native encoding, so 'UTF-8 characters' is incorrect. It is possible that it is an iconv problem if the translations are supplied in UTF-8 and not Latin-2. Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by using Latin-2 in preference to UTF-8? There are far too many layers involved here to guess at what is going on. My guess is that it ought to be possible to give a simple example of a string which can be output to the Rcmdr console and will be rendered incorrectly (together with a screen shot of how it is rendered). Indeed, please see above. I've also attached a screenshot under Windows, having started R with language=sl. I think the characters referred to are the Unicode glyphs 's and z with caron', \u0161 and \u017E. It seems that these will only be displayable in Rcmdr on Windows in a Latin-2 locale, which I do not have set up on Windows (but believe I could get installed). However, examples using that (and the menus) seem to be correct in both sl_SI.iso88592 and sl_SI.utf8 on Linux, which suggests that this is probably not an R issue but a Tcl/Tk one. I'm above my depth with respect to these issues, but I do find it curious that under Windows the characters appears correctly in the Script window but not the Output window. On Fri, 5 Sep 2008, John Fox wrote: Dear list members, I've attached some email correspondence with Jaro Lajovic (with his permission), detailing a problem with the Slovenian translation file for the Rcmdr package. Unfortunately, it is not 'detailed', and we do need the details. I hope that the additional information in this message will supply at least some o
Re: [Rd] Problem with UTF-8 text in the Rcmdr package
Dear Brian, > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Prof Brian Ripley > Sent: September-08-08 5:46 AM > To: John Fox > Cc: 'Jaro.Lajovic'; 'R-devel' > Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package > > Unless Windows is running in CP1250 (the Slovenian encoding on Windows), this > is not expected to work. I believe John tested in CP1252, and it just so > happens that those characters are in the same place in CP1250 and CP1252. Yes, that's right: My locale is English (Canada), which uses CP1252. > > I get something different in CP1250, as pasting into the script window also > does not work. But if I use the Unicode escapes, the result in the output > Window is rendered correctly in the output window. > > I think Jaro has put his finger on this: Tcl/Tk output thinks it is in > Latin-2 and not CP1250, and s and z caron have different positions in those > two character sets. Here is something I can reproduce easily: with XP set to > Slovenian: > > > x <-"ČŠŽčšž" > > x > [1] "ČŠŽčšž" > > charToRaw(x) > [1] c8 8a 8e e8 9a 9e > > which is correct for CP1250. Now if I submit 'x' in the Rcmdr script window, > I get the wrong output in the output window. > > And I've tracked that down to a bug in iconv (something we take from libiconv > on Windows): it does think the native encoding is Latin-2, not CP1252. I'll > put a workaround in R-devel and R-patched shortly. That has other potential > ramifications that will take me longer to investigate, and correct thing may > be to fix iconv. Thank you very much for tracking this down. Recall that there is also apparently a problem under Mac OS X, where the characters appeared incorrectly in both the Script and Output windows. Regards, John > > On Sun, 7 Sep 2008, John Fox wrote: > > > Dear Brian, > > > > Thank you for addressing the problem -- I was hoping that you would. > > > >> -Original Message- > >> From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > >> Sent: September-07-08 7:23 AM > >> To: John Fox > >> Cc: 'R-devel'; 'Jaro.Lajovic' > >> Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package > >> > >> The issue appears to be the Rcmdr output window and menus. They are > >> done using Tcl/Tk, not by R. So this might be a problem in Tcl/Tk or > >> the fonts it uses, or it might be problem with what Rcmdr passes to > >> the tcltk package. > >> > >> We need the means to reproduce this (as per the posting guide): > > > > Jaro provides an example in one of his messages in my posting (though > > it is slightly in error): If one enters > > > > cat("ČŠŽčšž\n") > > > > in the Rcmdr Script window, the characters are rendered correctly. > > Executing this command (via the Submit button) produces the following > > in the Output > > window: > > > >> cat("??\n") > > ?? > > > > which actually appears as > > > >> cat("??\n") > > ?? > > > > This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0. > > > >> > >> - what OSes are affected? Does this occur in a UTF-8 locale on > >> Linux, for example? > > > > I've now checked under Mac OS X and Linux Ubuntu, with the following > > results: > > > > Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4 > > > > cat("ČŠŽčšž\n") appears as cat("?\n") in *both* the Script window > > and the Output window. > > > > Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5 > > > > cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and > > the Output window. > > > >> > >> - in what locales? > > > > I'm afraid that I don't know how to check this short of changing the > > locale for my Windows machine. I do observe the problem in Windows > > when I start Rgui with language=sl. > > > >> > >> - what versions of Tcl/Tk? Note that shipped with Windows R changed > >> between 2.5.1 and 2.7.x. > > > > Yes, and please see above, but if the problem were with Tcl/Tk, why > > does this work in the Script window under Windows and in both Script > > and Output under Ubuntu? > > > >> > >> - Is this anything to do with translations? I've not looked at how > >> translations are done in Rcmdr, but if gettext() is used, the string > >> passed to R for output is in the native encoding, so 'UTF-8 > >> characters' is incorrect. It is possible that it is an iconv problem > >> if the translations are supplied in UTF-8 and not Latin-2. > > > > Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by > > using > > Latin-2 in preference to UTF-8? > > > >> > >> There are far too many layers involved here to guess at what is going on. > >> My guess is that it ought to be possible to give a simple example of > >> a string which can be output to the Rcmdr console and will be > >> rendered incorrectly (together with a screen shot of how it is rendered). > > > > Indeed, please see above. I've also attached a screenshot under > > Windows, having started R with language=sl. > > > >> > >> I think the characters referred to are the
Re: [Rd] (PR#12742) Different result with different order of
FAQ 7.31 strikes again. This is expected: you cannot do exact arithmetic on a binary computer if some of the quantities involved are not binary fractions (e.g. 1.3) See also the warning in ?`==`: identical() is equally inappropiate for computed numerical quantities. On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote: > Full_Name: Kyun-Seop Bae > Version: 2.7.2 > OS: MS-Windows XP SP2 > Submission from: (NULL) (148.168.40.4) > > > # Script that I used > > rm(list=objects()) > objects() > > WT <- 91 > AGE <- 41 > SCR <- 1.3 > > CCL1 <- (140-AGE) * WT / (72 * SCR) > CCL2 <- (140-AGE) * WT / 72 / SCR > > CCL1 > CCL2 > > identical(CCL1, CCL2) > identical(CCL1, 96.25) > identical(CCL2, 96.25) > > CCL1*10 + 0.5 > CCL2*10 + 0.5 > > floor(CCL1*10 + 0.5) > floor(CCL2*10 + 0.5) > > as.integer(CCL1*10 + 0.5) > as.integer(CCL2*10 + 0.5) > > > # Same with multiplied WT > # Same in S-Plus Enterprise Developer Version 7.0.6 for Microsoft Windows : > 2005 > # But these are accurate in MS-Excel. Unlikely, more likely you don't have identical() to test bit-level equality. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#12742) Different result with different order of
On Mon, 8 Sep 2008, ksbae wrote: > Thank you for the prompt reply. > > Do you know why does MS-Excel give result that I expected? Know, but I did comment on that. > > Thanks, > > Kyun-Seop BAE > Email: [EMAIL PROTECTED] > > > -Original Message- > From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] > Sent: Monday, September 08, 2008 1:53 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: [Rd] (PR#12742) Different result with different order of > binding > > FAQ 7.31 strikes again. > > This is expected: you cannot do exact arithmetic on a binary computer if > some of the quantities involved are not binary fractions (e.g. 1.3) > > See also the warning in ?`==`: identical() is equally inappropiate for > computed numerical quantities. > > On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote: > >> Full_Name: Kyun-Seop Bae >> Version: 2.7.2 >> OS: MS-Windows XP SP2 >> Submission from: (NULL) (148.168.40.4) >> >> >> # Script that I used >> >> rm(list=objects()) >> objects() >> >> WT <- 91 >> AGE <- 41 >> SCR <- 1.3 >> >> CCL1 <- (140-AGE) * WT / (72 * SCR) >> CCL2 <- (140-AGE) * WT / 72 / SCR >> >> CCL1 >> CCL2 >> >> identical(CCL1, CCL2) >> identical(CCL1, 96.25) >> identical(CCL2, 96.25) >> >> CCL1*10 + 0.5 >> CCL2*10 + 0.5 >> >> floor(CCL1*10 + 0.5) >> floor(CCL2*10 + 0.5) >> >> as.integer(CCL1*10 + 0.5) >> as.integer(CCL2*10 + 0.5) >> >> >> # Same with multiplied WT >> # Same in S-Plus Enterprise Developer Version 7.0.6 for Microsoft Windows > : >> 2005 >> # But these are accurate in MS-Excel. > > Unlikely, more likely you don't have identical() to test bit-level > equality. > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#12742) Different result with different order of binding
Thank you for the prompt reply. Do you know why does MS-Excel give result that I expected? Thanks, Kyun-Seop BAE Email: [EMAIL PROTECTED] -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Monday, September 08, 2008 1:53 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [Rd] (PR#12742) Different result with different order of binding FAQ 7.31 strikes again. This is expected: you cannot do exact arithmetic on a binary computer if some of the quantities involved are not binary fractions (e.g. 1.3) See also the warning in ?`==`: identical() is equally inappropiate for computed numerical quantities. On Mon, 8 Sep 2008, [EMAIL PROTECTED] wrote: > Full_Name: Kyun-Seop Bae > Version: 2.7.2 > OS: MS-Windows XP SP2 > Submission from: (NULL) (148.168.40.4) > > > # Script that I used > > rm(list=objects()) > objects() > > WT <- 91 > AGE <- 41 > SCR <- 1.3 > > CCL1 <- (140-AGE) * WT / (72 * SCR) > CCL2 <- (140-AGE) * WT / 72 / SCR > > CCL1 > CCL2 > > identical(CCL1, CCL2) > identical(CCL1, 96.25) > identical(CCL2, 96.25) > > CCL1*10 + 0.5 > CCL2*10 + 0.5 > > floor(CCL1*10 + 0.5) > floor(CCL2*10 + 0.5) > > as.integer(CCL1*10 + 0.5) > as.integer(CCL2*10 + 0.5) > > > # Same with multiplied WT > # Same in S-Plus Enterprise Developer Version 7.0.6 for Microsoft Windows : > 2005 > # But these are accurate in MS-Excel. Unlikely, more likely you don't have identical() to test bit-level equality. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel