from:"Thomas Zumbrunn"

[Rd] use of UTF-8 \uxxxx escape sequences in function arguments

2012-01-18 Thread Thomas Zumbrunn

While preparing a function that contained non-ASCII characters for inclusion 
into a package, I replaced all non-ASCII characters with UTF-8 escape 
sequences (using \u) in order to make the package portable (and adhere to 
"R CMD check"). What I didn't expect: when one uses UTF-8 escape sequences in 
function arguments, one needs to use UTF-8 escape sequences when calling the 
function, too - even when working in a UTF-8 locale. Is this an intended 
behaviour?

Here's an example to illustrate the (putative) problem:

   ## function that uses non-ASCII characters in arguments
   plain <- function(myarg = c("Basel", "Bern", "Zürich")) {
 myarg <- match.arg(myarg)
   }

   ## function that uses UTF-8 escape sequences in arguments
   escaped <- function(myarg = c("Basel", "Bern", "Z\u00BCrich")) {
 myarg <- match.arg(myarg)
   }

   ## test
   plain("Zürich")  ## works
   plain("Z\u00BCrich")  ## fails
   escaped("Zürich")  ## fails
   escaped("Z\u00BCrich")  ## works

Thank you for your help.
Thomas Zumbrunn


> sessionInfo() 
>   
>   
>  
R version 2.14.1 (2011-12-22)   

 
Platform: x86_64-unknown-linux-gnu (64-bit) 

 


 
locale: 

 
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C   LC_TIME=en_GB.UTF-8  

 
 [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
LC_MESSAGES=en_GB.UTF-8 

  
 [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C 

 
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C  

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

2012-01-19 Thread Thomas Zumbrunn

On Thursday 19 January 2012, peter dalgaard wrote:
> On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote:
> >   plain("Zürich")  ## works
> >   plain("Z\u00BCrich")  ## fails
> >   escaped("Zürich")  ## fails
> >   escaped("Z\u00BCrich")  ## works
> 
> Using the correct UTF-8 code helps quite a bit:
> 
> U+00BC¼   c2 bc   VULGAR FRACTION ONE QUARTER
> U+00FCü   c3 bc   LATIN SMALL LETTER U WITH DIAERESIS

Thank you for pointing that out. How embarrassing - I systematically used the 
wrong representations. Even worse, I didn't carefully read "Writing R 
Extensions" which speaks of "Unicode as \u escapes" rather than "UTF-8 as 
\u escapes", so e.g. looking up the UTF-16 byte representations would have 
done the trick.

I didn't find a recommended method of replacing non-ASCII characters with 
Unicode \u escape sequences and ended up using the Unix command line tool 
"iconv". However, the iconv version installed on my GNU/Linux machine 
(openSUSE 11.4) seems to be outdated and doesn't support the very useful "--
unicode-subst" option yet. I installed "libiconv" from 
http://www.gnu.org/software/libiconv/, and now I can easily replace all non-
ASCII characters in my UTF-8 encoded R files with:

  iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R

Thomas Zumbrunn

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

2012-01-20 Thread Thomas Zumbrunn

On Friday 20 January 2012, Simon Urbanek wrote:
> On Jan 19, 2012, at 6:39 PM, Thomas Zumbrunn wrote:
> > On Thursday 19 January 2012, peter dalgaard wrote:
> >> On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote:
> >>>  plain("Zürich")  ## works
> >>>  plain("Z\u00BCrich")  ## fails
> >>>  escaped("Zürich")  ## fails
> >>>  escaped("Z\u00BCrich")  ## works
> >> 
> >> Using the correct UTF-8 code helps quite a bit:
> >> 
> >> U+00BC ¼   c2 bc   VULGAR FRACTION ONE QUARTER
> >> U+00FC ü   c3 bc   LATIN SMALL LETTER U WITH DIAERESIS
> > 
> > Thank you for pointing that out. How embarrassing - I systematically used
> > the wrong representations. Even worse, I didn't carefully read "Writing
> > R Extensions" which speaks of "Unicode as \u escapes" rather than
> > "UTF-8 as \u escapes", so e.g. looking up the UTF-16 byte
> > representations would have done the trick.
> > 
> > I didn't find a recommended method of replacing non-ASCII characters with
> > Unicode \u escape sequences and ended up using the Unix command line
> > tool "iconv". However, the iconv version installed on my GNU/Linux
> > machine (openSUSE 11.4) seems to be outdated and doesn't support the
> > very useful "-- unicode-subst" option yet. I installed "libiconv" from
> > http://www.gnu.org/software/libiconv/, and now I can easily replace all
> > non-
> > 
> > ASCII characters in my UTF-8 encoded R files with:
> >  iconv -f UTF-8 -t ASCII --unicode-subst="\u%04X" my-utf-8-encoded-file.R
> 
> You can actually do that with R alone:
> 
> ## you'll have to make sure that you're in C locale so R does the conversion 
> for you
> Sys.setlocale(,"C")
> 
> utf8conv <- function(conn)
> gsub("","u\\1",capture.output(writeLines(readLines(conn,encoding="UTF-8"
> 
> > writeLines(utf8conv("test.txt"))
> 
> M\u00F6gliche L\u00F6sung
> ne nebezpe\u010Dn\u00E9
> 
> Cheers,
> Simon

Thanks for the above function (which I wouldn't have managed to construct, 
ever...). Maybe this is worth mentioning in the 
"Writing R Extensions" manual (next to where the \u Unicode escape 
sequences are mentioned).

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] X11 device distortion (PR#10666)

2008-01-31 Thread Thomas Zumbrunn

On Thursday 31 January 2008, Hin-Tak Leung wrote:
> My first thought was that you must be using Xinerama or TwinView -
> and you did mention Xinerama in your r-help message but not
> in your bug report - this detail is important.

Yes, I forgot to mention this.

> That said, I don't know enough about X11 to say anything - well, maybe
> I do, but you'll have to show your xorg.conf , and possibly the result
> of xdpyinfo for anybody to help you. I think your Xinerama setup is broken.

Yes, and as Prof. Ripley correctly guessed that the DisplaySize setting was 
wrong. SaX2, the X11 setup tool of openSUSE, adds up the width values of both 
Xinerama devices, which results in a wrong DisplaySize width value. I filed a 
bug at openSUSE and already got a reaction. It seems unclear whether it is 
correct to add up the values.

> for the time being, you could probably run the X11 device through Xnest
> to get around this.

Thanks for the advice. Correcting the DisplaySize value helped, and it didn't 
break the Xinerama setting.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] use of UTF-8 \uxxxx escape sequences in function arguments

Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

Re: [Rd] use of UTF-8 \uxxxx escape sequences in function arguments

Re: [Rd] X11 device distortion (PR#10666)

4 matches

Site Navigation

Mail list logo

Footer information