Re: [R] Writing escaped unicode

Jan T Kim Tue, 11 Dec 2012 02:50:58 -0800

On Mon, Dec 10, 2012 at 11:46:40PM -0500, David Kulp wrote:
> I'd like to write unicode strings using the "\u" escape syntax.  According to 
> the documentation, print.default or encodeString will escape unicode using 
> the \u convention.  In practice, I can't make it work.
> 
> > b="Unicode character: \ufffd"
> > print.default(b)
> [1] "Unicode character: ???"
> > encodeString(b)
> [1] "Unicode character: ???"
> 
> I want to write the string back out in the same escape formatting as I read 
> it in.  This is because I'm interfacing with some Ruby code that requires 
> unicode to be in this escaped format.


as I read the documentation, encodeString escapes control characters,
but not "unicode characters". The notion of a "unicode character" is
not entirely well defined, considering that the very mission of the
unicode consortium is to make sure that there are no non-unicode
characters...  ;-)

>From this it follows that replacing all characters with their \uxxxx
representation, e.g. by

    paste(sprintf("\\u%04x", utf8ToInt(b)), collapse = "");

should work with the Ruby client you try to talk to. Obviously, this
bloats the string rather more than necessary (particularly if most of
the characters are in the ASCII range), but if the volume you're
piping into the client is small, this may be good enough.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jtt...@gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing escaped unicode

Reply via email to