On Fri, 30 May 2008, Duncan Murdoch wrote:
On 5/30/2008 4:12 PM, Hans-Joerg Bibiko wrote:
Quoting Duncan Murdoch <[EMAIL PROTECTED]>:
On 5/30/2008 12:58 PM, Hans-Jörg Bibiko wrote:
to put it simply. Windows cannot handle utf-8 data. There is no utf-8
locale available.
Code page 65001 is utf-8. Most text editors (including Notepad)
include an option to save in the UTF-8 encoding.
Some programs don't fully support utf-8 (some don't even support the
native UCS-2), but most don't care. That's the nice thing about utf-8.
So in what sense can Windows not handle utf-8 data?
Of course, you're right. I only meant in that context R for Windows, not
Windows at all. Sorry for my incorrectness.
But I think with Brian Ripley's work over the last while, R for Windows
actually handles utf-8 pretty well. (It might not guess at that encoding,
but if you tell it that's what you're using...)
UTF-8, please (only the capitalized form is correct).
R passes around, prints and plots UTF-8 character data pretty well, but it
translates to the native encoding for almost all character-level
manipulations (and not just on Windows). ?Encoding spells out the
exceptions (and I think the original poster had not read it). As time
goes on we may add more, but it is really tedious (and somewhat
error-prone) to have multiple paths through the code for different
encodings (and different OSes do handle these differently -- Windows' use
of UTF-16 means that one character may not be one wchar_t).
A couple of the other points in the original posting were corrected in
R-patched just after release.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.