I've found where the problem was and a way to solve this problem: One dataset was encoded (and read) as UTF-8 and the other one was encoded (en read) as latin3 In this case, even if at the terminal you see the same characters, R states that the two elements are not equal.
Don't know if this is the way it has to be, or this is a bug. Anyway, editing the second file (encoded as latin3) with OpenOffice calc, saving it as UTF-8 and reading it in R with encoding="UTF-8" solved the problem Agus -------- Original Message -------- Subject: Matching failure in merge() Date: Thu, 19 Mar 2009 20:04:48 +0100 From: Agustin Lobo <agustin.l...@ija.csic.es> Reply-To: agustin.l...@ija.csic.es To: r-help@r-project.org Hi! I've done a merging between 2 dataframes using merge(): delme <- merge(miDUNS50peqB,Bnomscodmunicipis,by.x="POBLACION",by.y="NOMMUNI",all.x=T,sort=F) After realizing some problems in the resulting dataset, I've found that the problem was that, in some cases, there was no match between the by.x and the by.y elements, despite the fact that, apparently, such matching should exist. Specifically, I get no match for the cases in which both the by.x and the by.y variables are equal to "SANT VICENÇ DELS HORTS". In the following example I select fields POBLACION and NOMMUNI in two cases for which both fields should be identical to "SANT VICENÇ DELS HORTS" and I get: (082634 is the municipality code for that town in Bnomscodmunicipis and 08620 is the postal code for that town in delme)
x <- Bnomscodmunicipis[Bnomscodmunicipis$CODMUN=="082634",1][1] y <- miDUNS50peqB[miDUNS50peqB$CODPOSTAL=="08620","POBLACION"][1]
str(x)
chr "SANT VICENÇ DELS HORTS"
str(y)
chr "SANT VICENÇ DELS HORTS"
x==y
[1] FALSE which I cannot understand. If I just cut and paste those values and run the equivalent logical operation:
"SANT VICENÇ DELS HORTS" == "SANT VICENÇ DELS HORTS"
[1] TRUE The problem is that the values for "SANT VICENÇ DELS HORTS" in the resulting merged dataframe are wrong. Any help with this issue would be greatly appreciated, I'm really astonished. I think it might involve an encoding problem with the non-ascii characters, but don't get to see it. I'm using R 2.8.1 on ubuntu 8.04 (in english; And R is in English too) Agus -- Dr. Agustin Lobo Institut de Ciencies de la Terra "Jaume Almera" (CSIC) LLuis Sole Sabaris s/n 08028 Barcelona Spain Tel. 34 934095410 Fax. 34 934110012 email: agustin.l...@ija.csic.es http://www.ija.csic.es/gt/obster -- Dr. Agustin Lobo Institut de Ciencies de la Terra "Jaume Almera" (CSIC) LLuis Sole Sabaris s/n 08028 Barcelona Spain Tel. 34 934095410 Fax. 34 934110012 email: agustin.l...@ija.csic.es http://www.ija.csic.es/gt/obster ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.