Matthias Wendel wrote: > Hello, Peter, > I tried it out: iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", > "LATIN1", sub='byte') yielded > > [1] "<c4>rzte Chirurgie" > > and c4 corresponds in most encodings to Ä. What can I do next? I wonder > whether there is a more comfortable way then to change the > occurences of <..> by the adequate character. > Not sure what you want here. Isn't it just the reverse conversion, iconv(...., from="latin1", to="utf8") ???
Notice that c4 is not Ä in UTF8: > iconv("Ä", to="ascii", sub="byte") [1] "<c3><84>" in fact c4 is not anything in UTF8, hence the "invalid string" message. > Regards, > Matthias > > -----Ursprüngliche Nachricht----- > Von: Peter Dalgaard [mailto:[EMAIL PROTECTED] > Gesendet: Dienstag, 1. Januar 2008 20:21 > An: Matthias Wendel > Betreff: Re: AW: [R] Another problem with encoding > > Matthias Wendel wrote: > >> Happy new year and my apologies, Peter. Here are the missing facts: >> I'm reading in a spss-file, doing some calculations and putting the >> results in a xml file. The xml-file is UTF-8 encoded and so should the >> results and their labels (eg Ärzte Chirurgie): >> Here is part of the R session: >> >> >> > As a matter of principle: Requests for more information are not offers that I > will solve your problems personally. Stay on the list! > > The characters seem to travel OK in email, so latin1is a guess. Have you > tried the sub="byte" argument to iconv()? > > > > >> >> >>> Sys.getlocale() >>> >>> >> [1] >> >> > "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.125 > >> 2" >> >> >>> spss[,'Y6'] >>> >>> >> [1] 6 3 8 11 8 9 6 8 3 5 10 15 NA 9 8 3 8 16 6 6 NA 10 5 >> 2 7 7 6 16 7 15 7 10 12 >> [34] 8 7 12 12 16 7 6 8 8 15 6 NA 8 99 7 12 8 9 16 7 16 8 7 >> 7 1 15 12 8 7 10 7 8 7 >> [67] 8 9 8 6 6 8 6 16 11 5 11 11 1 11 3 7 7 10 10 10 6 11 16 >> NA 1 3 2 10 99 10 3 3 9 >> [100] 7 16 99 16 1 10 2 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 >> 13 NA 10 16 16 NA 6 10 5 11 >> [133] 11 1 1 1 1 16 1 16 1 1 1 1 6 6 6 16 8 16 16 16 16 5 6 >> 10 99 11 11 10 6 6 1 1 6 >> [166] 1 11 11 16 9 11 16 6 8 8 16 16 8 6 16 16 12 12 12 12 12 12 12 >> 16 9 16 15 12 12 15 10 16 15 >> [199] 4 1 2 14 4 4 2 5 NA 1 5 5 7 9 5 12 12 NA 16 12 12 12 12 >> 12 12 12 12 12 99 NA 12 12 NA >> [232] 1 16 1 7 11 5 6 7 1 13 6 8 16 2 1 5 16 16 9 8 8 8 7 >> 16 8 8 2 8 5 4 6 14 5 >> [265] 14 8 8 14 4 4 8 14 8 14 6 2 3 14 3 16 5 15 15 15 15 15 15 >> 15 15 15 15 15 13 13 13 13 13 >> [298] 13 13 13 13 13 13 13 13 15 6 NA 12 3 9 9 NA 10 16 >> attr(,"value.labels") >> Verwaltung Servicegesellschaft Waldfriede (SKW) >> 16 15 >> Kurzzeitpflege Waldfriede Sozialstation >> 14 13 >> Krankenpflegeschule Med. Technischer Dienst >> 12 11 >> Pflege OP Funktionsdienst >> 10 9 >> Pflege Gynäkologie Pflege Chirurgie >> 8 7 >> Pflege Innere Ärzte Anästhesie, Röntgen >> 6 5 >> Ärzte Gynäkologie Ärzte Chirurgie >> 4 3 >> Ärzte Innere Patientenberatung/-betreuung >> 2 1 >> >> >>> names(attributes(spss[,'Y6'])[[1]][14]) >>> >>> >> [1] "Ärzte Chirurgie" >> >> >>> iconv(names(attributes(spss[,'Y6'])[[1]][14]), "UTF-8", "LATIN1") >>> >>> >> [1] NA >> >> >>> utf8ToInt(names(attributes(spss[,'Y6'])[[1]][14])) >>> >>> >> Fehler in utf8ToInt(names(attributes(spss[, "Y6"])[[1]][14])) : >> invalid UTF-8 string >> >> >> Cheers, >> Matthias >> >> >> -----Ursprüngliche Nachricht----- >> Von: Peter Dalgaard [mailto:[EMAIL PROTECTED] >> Gesendet: Montag, 31. Dezember 2007 10:45 >> An: Matthias Wendel >> Cc: [EMAIL PROTECTED] >> Betreff: Re: [R] Another problem with encoding >> >> Matthias Wendel wrote: >> >> >>> Hi >>> I've imported an spss-file using read.spss. One variable has value >>> like 'Ärzte'. I thought this is UTF-8 encoded, but it is not (as the >>> results of iconv and utf8ToInt suggest). Is there any way to >>> >>> >> find out how these spss-values are encoded? >> >> >>> >>> >>> >> You are assuming a bit much of your readers. >> >> What exactly are you doing? Is it a value, a value label, or perhaps a >> variable name. How do the results of read.spss look on the >> > R > >> side? How did you apply iconv and utf8ToInt? What is your locale? >> >> I mean, we could try and guess all those details, but you are the one with >> the hard info, and the motivation... >> >> >> > > > -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.