[Rd] use of UTF-8 \uxxxx escape sequences in function arguments
While preparing a function that contained non-ASCII characters for inclusion into a package, I replaced all non-ASCII characters with UTF-8 escape sequences (using \u) in order to make the package portable (and adhere to "R CMD check"). What I didn't expect: when one uses UTF-8 escape sequences in function arguments, one needs to use UTF-8 escape sequences when calling the function, too - even when working in a UTF-8 locale. Is this an intended behaviour? Here's an example to illustrate the (putative) problem: ## function that uses non-ASCII characters in arguments plain <- function(myarg = c("Basel", "Bern", "Zürich")) { myarg <- match.arg(myarg) } ## function that uses UTF-8 escape sequences in arguments escaped <- function(myarg = c("Basel", "Bern", "Z\u00BCrich")) { myarg <- match.arg(myarg) } ## test plain("Zürich") ## works plain("Z\u00BCrich") ## fails escaped("Zürich") ## fails escaped("Z\u00BCrich") ## works Thank you for your help. Thomas Zumbrunn > sessionInfo() > > > R version 2.14.1 (2011-12-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments
On Thursday 19 January 2012, peter dalgaard wrote: > On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote: > > plain("Zürich") ## works > > plain("Z\u00BCrich") ## fails > > escaped("Zürich") ## fails > > escaped("Z\u00BCrich") ## works > > Using the correct UTF-8 code helps quite a bit: > > U+00BC¼ c2 bc VULGAR FRACTION ONE QUARTER > U+00FCü c3 bc LATIN SMALL LETTER U WITH DIAERESIS Thank you for pointing that out. How embarrassing - I systematically used the wrong representations. Even worse, I didn't carefully read "Writing R Extensions" which speaks of "Unicode as \u escapes" rather than "UTF-8 as \u escapes", so e.g. looking up the UTF-16 byte representations would have done the trick. I didn't find a recommended method of replacing non-ASCII characters with Unicode \u escape sequences and ended up using the Unix command line tool "iconv". However, the iconv version installed on my GNU/Linux machine (openSUSE 11.4) seems to be outdated and doesn't support the very useful "-- unicode-subst" option yet. I installed "libiconv" from http://www.gnu.org/software/libiconv/, and now I can easily replace all non- ASCII characters in my UTF-8 encoded R files with: iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R Thomas Zumbrunn __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments
On Friday 20 January 2012, Simon Urbanek wrote: > On Jan 19, 2012, at 6:39 PM, Thomas Zumbrunn wrote: > > On Thursday 19 January 2012, peter dalgaard wrote: > >> On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote: > >>> plain("Zürich") ## works > >>> plain("Z\u00BCrich") ## fails > >>> escaped("Zürich") ## fails > >>> escaped("Z\u00BCrich") ## works > >> > >> Using the correct UTF-8 code helps quite a bit: > >> > >> U+00BC ¼ c2 bc VULGAR FRACTION ONE QUARTER > >> U+00FC ü c3 bc LATIN SMALL LETTER U WITH DIAERESIS > > > > Thank you for pointing that out. How embarrassing - I systematically used > > the wrong representations. Even worse, I didn't carefully read "Writing > > R Extensions" which speaks of "Unicode as \u escapes" rather than > > "UTF-8 as \u escapes", so e.g. looking up the UTF-16 byte > > representations would have done the trick. > > > > I didn't find a recommended method of replacing non-ASCII characters with > > Unicode \u escape sequences and ended up using the Unix command line > > tool "iconv". However, the iconv version installed on my GNU/Linux > > machine (openSUSE 11.4) seems to be outdated and doesn't support the > > very useful "-- unicode-subst" option yet. I installed "libiconv" from > > http://www.gnu.org/software/libiconv/, and now I can easily replace all > > non- > > > > ASCII characters in my UTF-8 encoded R files with: > > iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R > > You can actually do that with R alone: > > ## you'll have to make sure that you're in C locale so R does the conversion > for you > Sys.setlocale(,"C") > > utf8conv <- function(conn) > gsub("","u\\1",capture.output(writeLines(readLines(conn,encoding="UTF-8" > > > writeLines(utf8conv("test.txt")) > > M\u00F6gliche L\u00F6sung > ne nebezpe\u010Dn\u00E9 > > Cheers, > Simon Thanks for the above function (which I wouldn't have managed to construct, ever...). Maybe this is worth mentioning in the "Writing R Extensions" manual (next to where the \u Unicode escape sequences are mentioned). Thomas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] X11 device distortion (PR#10666)
On Thursday 31 January 2008, Hin-Tak Leung wrote: > My first thought was that you must be using Xinerama or TwinView - > and you did mention Xinerama in your r-help message but not > in your bug report - this detail is important. Yes, I forgot to mention this. > That said, I don't know enough about X11 to say anything - well, maybe > I do, but you'll have to show your xorg.conf , and possibly the result > of xdpyinfo for anybody to help you. I think your Xinerama setup is broken. Yes, and as Prof. Ripley correctly guessed that the DisplaySize setting was wrong. SaX2, the X11 setup tool of openSUSE, adds up the width values of both Xinerama devices, which results in a wrong DisplaySize width value. I filed a bug at openSUSE and already got a reaction. It seems unclear whether it is correct to add up the values. > for the time being, you could probably run the X11 device through Xnest > to get around this. Thanks for the advice. Correcting the DisplaySize value helped, and it didn't break the Xinerama setting. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel