[R] Matching names with non-English characters

Spencer Graves Mon, 13 May 2013 09:08:31 -0700

Hello:

How can one match names containing non-English characters thatappear differently in different but related data files? For example, Ihave data on Raúl Grijalva, who represents the third district of Arizonain the US House of Representatives. This first name appears as "RaÃºl"in data read from one file and "Raul" from another.

The ideal would convert both "RaÃºl" and "Raúl" to "Raul". Areasonable alternative would identify the non-English characters andmatch on everything else ("^Ra" and "l$" in this case). The files allcontain state and district, so "AZ-3" could be part of the solution.However, the file also contains data on Grijalva's predecessor in thatoffice, Ben Quayle, so "AZ-3" is not enough.



      Thanks,
      Spencer


p.s.  My current data contains other similar cases, e.g.:


    Recipient     District
RaÃºl Grijalva   AZ House 3
Tony CÃ¡rdenas   CA House 29
Linda SÃ¡nchez   CA House 38
RaÃºl Labrador   ID House 1
AndrÃ© Carson    IN House 7
Bob MenÃ©ndez    NJ Senate
Ben Ray LujÃ¡n   NM House 3
JosÃ© Serrano    NY House 15
Nydia VelÃ¡zquez NY House 7
RubÃ©n Hinojosa  TX House 15

These names all appear differently in another file I have. I'vewritten an ugly function that can identify "nonstandard characters".I'm confident I can solve this problem. However, I'm adding things likethis to the Ecdat package, and it would be more useful for others if Imade better use of other capabilities in R.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matching names with non-English characters

Reply via email to