Dear Milan, please see my results inline On 11 Dec 2012, at 16:58, Milan Bouchet-Valat <nalimi...@club.fr> wrote:
> Le mardi 11 décembre 2012 à 16:41 +0100, Richard Zijdeman a écrit : >> Dear Milan, >> >> thank you for kind suggestion. Converting the characters using: >>> iconv(department, "ISO-8859-15", "UTF-8") >> indeed improves the situation in that now all values (names of >> departments) are displayed in the plot, although the specific special >> characters are unfortunately appearing as empty boxes. > I wouldn't call that an improvement... :-/ > > What's the result of the other one, i.e. > iconv(department, "UTF-16", "UTF-8") That does not change the outcome, i.e. the names of departments with special characters are not plotted at all. > >> I have tried to see whether I could 'save' my state file using UTF-8 >> format, and although this proves to be a popular request it does not >> seem to have been incorporated in Stata. > You should not need this. iconv() should be able to convert the strings > to something usable. The problem is to determine what's the original > encoding. Could you call > lapply(department, charToRaw) > > and post the output? Thanks for providing another suggestions. I have selected 3 cases from the dataset I am working with that are problematic and have made new vars based on the iconv conversion. The department variable is called 'liac' and I now have next to the original three different versions based on the the UTF16, ISO-8859-1 and ISO-8859-15 conversion. I hope I executed it properly, but there seems to be an error when executing your code on the original variable. ## start results > head(tra.s) liac liac2 liac3 liac1 18 Ard\x8fche Ard\u008fche Ard\u008fche <NA> 29 Corr\x8fze Corr\u008fze Corr\u008fze <NA> 31 Vend\x8ee Vend\u008ee Vend\u008ee 噥湤蹥 > lapply(tra.s$liac,charToRaw) # original (stata import) Error in FUN(X[[1L]], ...) : argument must be a character vector of length 1 > lapply(tra.s$liac1, charToRaw) # UTF16 -> UTF-8 [[1]] [1] 4e 41 [[2]] [1] 4e 41 [[3]] [1] e5 99 a5 e6 b9 a4 e8 b9 a5 > lapply(tra.s$liac2, charToRaw) # ISO-8859-1 -> UTF-8 [[1]] [1] 41 72 64 c2 8f 63 68 65 [[2]] [1] 43 6f 72 72 c2 8f 7a 65 [[3]] [1] 56 65 6e 64 c2 8e 65 > lapply(tra.s$liac3, charToRaw) # ISO-8859-15 -> UTF-8 [[1]] [1] 41 72 64 c2 8f 63 68 65 [[2]] [1] 43 6f 72 72 c2 8f 7a 65 [[3]] [1] 56 65 6e 64 c2 8e 65 ## end results Best wishes and thanks, Richard > > > Regards > >> Best and thank you for your help, >> >> Richard >> >> >> On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimi...@club.fr> wrote: >> >>> Le mardi 11 décembre 2012 à 01:10 +0100, Richard Zijdeman a écrit : >>>> Dear all, >>>> >>>> I have imported a dataset from Stata using the foreign package. The >>>> original data contain French characters such as and . >>>> After importing, string variables containing names of French >>>> departments have changed. E.g. Ardche became Ard\x8fche. I would like >>>> to ask how I could plot these changed strings, since now the strings >>>> with special characters fail to be printed in the plot (either using >>>> plot() or ggplot2()). >>>> >>>> I have googled for solutions, but actually find it hard to determine >>>> whether I should change my R setup or should read in the data in a >>>> different way. Since I work on a mac I changed my local according to >>>> the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and >>>> code and output on what works for me and what does not. Thank you in >>>> advance for you comments. >>> Accentuated characters should work fine on a machine using a UTF-8 >>> locale as yours. I think the problem is that the imported data uses >>> ISO8859-15 or UTF-16, not UTF-8. >>> >>> I have no idea whether .dta files specify an encoding or not, but I >>> think you can convert them in R by calling >>> iconv(department, "ISO-8859-15", "UTF-8") >>> or >>> iconv(department, "UTF-16", "UTF-8") >>> >>>> Best, >>>> >>>> Richard >>>> >>>> #-------------- >>>> rm(list=ls()) >>>> sessionInfo() >>>> # R version 2.15.2 (2012-10-26) >>>> # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>>> # >>>> # locale: >>>> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> # creating variables >>>> department <- c("Nord","Paris","Ard\x8fche") >>> \x8 does not correspond to "è" AFAIK. In ISO8859-1 and -15 and UTF-16, >>> it's \xE8 ("\uE8" in R). >>> >>> In UTF-8, it's C3 A8, "\303\250" in R. >>> >>>> department2 <- c("Nord", "Paris", "Ardche") >>>> n <- c(2,4,1) >>>> >>>> # creating dataframes >>>> df <- data.frame(department,n) >>>> df2 <- data.frame(department2,n) >>>> >>>> department >>>> # [1] "Nord" "Paris" "Ard\x8fche" >>>> department2 >>>> # [1] "Nord" "Paris" "Ardche" >>>> >>>> plot(df) # fails to show the text "Ardche" >>>> plot(df2) # shows text "Ardche" >>>> >>>> # EOF >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.