Am 26.02.2015 um 23:44 schrieb Winston Chang: > On Thu, Feb 26, 2015 at 2:09 PM, maill...@tlink.de > <mailto:maill...@tlink.de> <maill...@tlink.de > <mailto:maill...@tlink.de>> wrote: > > > When I send some outlandish characters through enc2native (or > format) in R 3.1.2 on Ubuntu trusty it works quite well: > > > "®ØΔЊת" > [1] "®ØΔЊת" > > enc2native("®ØΔЊת") > [1] "®ØΔЊת" > > Encoding(enc2native("®ØΔЊת")) > [1] "UTF-8" > > In Windows the result is different: > > > "®ØΔЊת" > [1] "®ØΔЊת" > > enc2native("®ØΔЊת") > [1] "®Ø<U+0394><U+040A><U+05EA>" > > Encoding(enc2native("®ØΔЊת")) > [1] "latin1" > > And this is wrong. The native character set of a unicode > application under Windows is *Unicode*. enc2native should do the > same under Windows as it does on Ubuntu. Also the "unknown" > encoding should be changed to mean the same as "UTF-8" exactly as > it is on Linux. > > > I think you're mixing up the term "character set" with the encoding > for a character set. Unicode is a character set. UTF-8 is one of many > encodings of Unicode. > > UTF-8 may be the native character encoding in Ubuntu, but it's not the > native encoding in Windows. According to this, what counts as the > native encoding in Windows depends on the code page: > http://stackoverflow.com/a/4649507 > > So you shouldn't expect enc2native to do the same thing on Linux and > Windows. If you really want UTF-8, you can use enc2utf8. > > -Winston
Maybe I'm expecting too much but I rather have R not to produce garbage like "®Ø<U+0394><U+040A><U+05EA>" and while I can use enc2utf8 to convert from UTF-8 to UTF-8 this does not fix the many places - like "format" - where enc2native is used and that are broken because of this. [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel