Thomas, I wasn't able to reproduce your finding. The last two characters in my 'out.txt' file were just as expected. But, I'm in an UTF-8 locale. Your locale affects the encoding of characters on your platform. If you're not in a UTF-8 locale, then characters are converted from your native encoding to UTF-8 (when you specify encoding="UTF-8"). In the process of conversion, it's possible to lose information. You can test whether there is a loss (or a change rather) when R writes these characters like so:
# what does űŁ look like in binary (hex)? raw_before <- charToRaw("űŁ") # write 'out.txt' as before out <- file(description="out.txt", open="w", encoding="UTF-8") write(x="űŁ", file=out) close(con=out) # read in the two characters out <- file(description="out.txt", open="r", encoding="UTF-8") raw_after <- charToRaw(readChar(con=out, nchars=2)) close(con=out) # compare the raw representations identical(raw_before, raw_after) This test passes on my machine. But, there's also the question of whether these characters made it onto R-help list unaltered. Also, please include the result of sessionInfo() in you subsequent messages. Best, Matt > sessionInfo() R version 2.11.1 (2010-05-31) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base On Thu, 2011-02-17 at 13:54 -0800, tpklein wrote: > Hello, > > I am working with a data frame containg character strings with many special > symbols from various European languages. When writing such character > strings to a file using the UTF-8 encoding, some of them are converted in a > strange way. See the following example, run in R 2.12.1 on Windows 7: > > out <- file( description="out.txt", open="w", encoding="UTF-8") > write( x="äöüßæűŁ", file=out ) > close( con=out ) > > The last two symbols in the character string are converted to "uL" while all > other characters are not changed (which is what I want). How to explain > this? Does it have something to do with my locale? And is there a way to > work around this problem? -- Any help would be greatly appreciated. > > Thomas ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.