Hi Jim, I ended up collaborating with someone, and, on the basis of looking at your code (we did take it into consideration and talk about it), we came up with this:
library(stringr) numextract <- function(string){ str_extract(string, "\\-*\\d+\\,*\\d*") } myDataSet$zip<-numextract(myDataSet$state) combineddata<-merge(zipcode, myDataSet, by.x="zip", by.y="zip") So, as I understand it, we build a function the purpose of which was to extract the numerical value from a string value, imputed that into a column, then merged the two data frames together. It worked! Now I just need to figure out this thing called shape data...basically I need to figure out how to interpose a shape of the United States underneath my data points so that I can see them over the location to which they correspond. Nicola On Mon, May 13, 2019 at 9:09 PM Jim Lemon <drjimle...@gmail.com> wrote: > > Hi Nicola, > Getting the blank rows will be a bit more difficult and I don't see > why they should be in the final data frame, so: > > townzip<-read.table(text="waltham, Massachusetts 02451 > Columbia, SC 29209 > > Wheat Ridge , Colorado 80033 > Charlottesville, Virginia 22902 > Fairbanks, AK 99709 > Montpelier, VT 05602 > Dobbs Ferry, New York 10522 > > Henderson , Kentucky 42420", > sep="\t",stringsAsFactors=FALSE) > zip_split<-function(x) { > commasplit<-unlist(strsplit(x,",")) > state<-trimws(gsub("[[:digit:]]","",commasplit[2])) > zip<-trimws(gsub("[[:alpha:]]","",commasplit[2])) > return(c(commasplit[1],state,zip)) > } > townzipsplit<-as.data.frame(t(sapply(townzip$V1,zip_split))) > rownames(townzipsplit)<-NULL > names(townzipsplit)<-c("town","state","zip") > townzipsplit$latlon<-NA > # I don't know the name of the zipcode column in the "zipcode" data frame > newzipdf<-merge(townzipsplit,zipcodedf,by.x="zip",by.y="zip") > > Jim > > On Tue, May 14, 2019 at 5:57 AM Nicola Ruggiero > <nicola.ruggiero....@gmail.com> wrote: > > > > Hello everyone, > > > > I've downloaded Jeffrey Breen's R package "zipcode," which has the > > latitude and longitude for all of the US zip codes. So, this is a > > data.frame with 43,191 observations. That's one data frame in my > > environment. > > > > Then, I have another data.frame with over 100,000 observations that > > look like this: > > > > waltham, Massachusetts 02451 > > Columbia, SC 29209 > > > > Wheat Ridge , Colorado 80033 > > Charlottesville, Virginia 22902 > > Fairbanks, AK 99709 > > Montpelier, VT 05602 > > Dobbs Ferry, New York 10522 > > > > Henderson , Kentucky 42420 > > > > The spaces represent absences in the column. Regardless, > > I need to figure out how to write a code that would, presumably, match > > the zipcodes and produce another column to the data frame with the > > latitude and longitude. So, for example, the code would recognize > > 02451 above, and, in the the column next to it, the code would write > > 42.3765° N, 71.2356° W in the column next to it, since that's the > > latitude and longitude for Waltham, Massachusetts. > > > > Any idea of how to begin a code that would perform such an operation? > > > > Again, I have a data.frame with the zipcodes linked to the the > > latitudes and longitudes, on the one hand, and another data.frame with > > only zipcodes (and some holes). I need to produce the corresponding > > latitude/longitudes in the latter data.frame. > > > > Nicola > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.