With respect to your comment (sorry, the e-mail you wrote that in didn't get to my inbox):
>> I don't think so. In general, functions that convert to the native >> encoding break UTF-8 on Windows, because the native encoding is often >> Latin1 or some other encoding that doesn't cover all the characters in >> UTF-8. As I understand it, the native encoding in Windows is UTF-18, not Latin1: http://msdn.microsoft.com/en-us/library/dd374081.aspx And UTF-18 is a superset of UTF-8, isn't it? Sverre On Sun, Nov 10, 2013 at 1:49 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 13-11-10 7:31 AM, Sverre Stausland wrote: >> >> My e-mail was intended as a typical "feature request", and I couldn't >> find any more suitable place for that than the r-devel mailing list. I >> am not a programmer, so I don't have the skills to write this into R's >> source code myself. >> >> The incentive is nevertheless clear enough. I believe a software >> program in 2013 which imports, manipulates, and exports text in >> various formats (text files, picture files, postscript files, etc.) >> would normally be expected to support UTF-8. It might not be trivial >> to implement as R is written now, but the expectation will still be >> there. So I still believe it would be a good idea if R soon would be >> able to support UTF-8. > > > R does support UTF-8. It all works smoothly in a UTF-8 locale, not so > smoothly if you have your computer set up to use a different 8 bit encoding. > >> >> I'm not quite able to piece together from the information you gave >> what the underlying issues are. What I read is: >> (1) Some R functions convert characters to the native encoding. >> (2) Windows did not support UTF-8 when R was first written. >> (3) Unix did not support UCS-2 when R was first written. >> >> I'm guessing here that the implications are: >> (1) R's write.table() converts characters to a native encoding. >> (2) The native encoding in Windows 7 is not UTF-8. >> (3) The native encoding in Unix systems is UTF-8. > > > You got it right for the first 4. Regarding (2) in your second list, that's > right, and in fact UTF-8 is not supported as a native encoding. > And point (3) is optional, though UTF-8 is the dominant encoding nowadays. > > The easiest solution is for you to switch to a Unix variant and set it up to > use UTF-8 as the native encoding. > > Next easiest would be for Microsoft to add UTF-8 as a code page. > > Most difficult would be for R to handle UTF-8 properly on systems with > limited support for it. > > We probably will add small changes that let you work around the Windows > problems, but they won't be very satisfactory to anyone. I don't think we > will make the big changes that would make R look like "a software program in > 2013", since it would be so much work, and there's such an easy workaround. > > Duncan Murdoch > > >> But this is just guesswork. > > > >> >> PS. A related issue: >> >> http://stackoverflow.com/questions/19881553/using-unicode-inside-rs-expression-command >> >> Sverre >> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel