Hi list!

Short version: How do I convert a whole data.frame from latin1
encoding to utf8?

I get SPSS files with latin1 encoding. My OS is GNU/Linux and the
locale sv_SE.utf8, and I normally interface R with Emacs/ESS. I have
used the following hack to convert a data.frame in latin1 to utf8:

> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.iso88591")
> foo <- read.spss("foo.sav", to.data.frame=TRUE)
> write.table(foo, "foo.data")
$ recode lat1..utf8 foo.data
> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.utf8")
> foo <- read.table("foo.data")

I have now found two problems with this approach: 

a) variable.labels is droped
b) the order of unordered factors is changed

I had just worked out a hack for a) when I realised b). b) is a
problem when the factors really is ordered, but not recognized as such
by read.spss (and/or not defined as such in SPSS, but since SPSS
respects the numeric values of the factors anyway, users don't need
to)

Rather than hack around b) too, I wonder if anyone on the list know
how to convert a whole data.frame from latin1 encoding to utf8?

TIA

-- 
Hans Ekbrand (http://sociologi.cjb.net) <h...@sociologi.cjb.net>
A. Because it breaks the logical sequence of discussion
Q. Why is top posting bad?

Attachment: signature.asc
Description: Digital signature

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to