On Mon, Dec 10, 2012 at 11:46:40PM -0500, David Kulp wrote: > I'd like to write unicode strings using the "\u" escape syntax. According to > the documentation, print.default or encodeString will escape unicode using > the \u convention. In practice, I can't make it work. > > > b="Unicode character: \ufffd" > > print.default(b) > [1] "Unicode character: ???" > > encodeString(b) > [1] "Unicode character: ???" > > I want to write the string back out in the same escape formatting as I read > it in. This is because I'm interfacing with some Ruby code that requires > unicode to be in this escaped format.
as I read the documentation, encodeString escapes control characters, but not "unicode characters". The notion of a "unicode character" is not entirely well defined, considering that the very mission of the unicode consortium is to make sure that there are no non-unicode characters... ;-) >From this it follows that replacing all characters with their \uxxxx representation, e.g. by paste(sprintf("\\u%04x", utf8ToInt(b)), collapse = ""); should work with the Ruby client you try to talk to. Obviously, this bloats the string rather more than necessary (particularly if most of the characters are in the ASCII range), but if the volume you're piping into the client is small, this may be good enough. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | email: jtt...@gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----* ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.