I'm working with some data, and am trying to generate it in the following
format.

                                                  state    city     zipcode
I like pizza                                    0         0           0
I live in Denver                             0         1           0
All the fun stuff is in Alaska          1         0           0
he lives in 66062                          0         0           1

So basically, I'm generating a 1 if a phrase contains a state, city, or zip
code, and 0 if it doesn't.

Using the stringr package, I developed the following code:

 library(stringr)
 inscompany_match <- str_c(inscompany, collapse = "|")
 state_match <- str_c(state, collapse = "|")
 city_match <- str_c(city, collapse = "|")
 agency_match <- str_c(agency, collapse = "|")
 zipcode_match <- str_c(zipcode, collapse = "|")
 mydf$inscompany <- as.numeric(str_detect(mydf$keyword, inscompany_match))
 mydf$state <- as.numeric(str_detect(mydf$keyword, state_match))
 mydf$city <- as.numeric(str_detect(mydf$keyword, city_match))
 mydf$agency <- as.numeric(str_detect(mydf$keyword, agency_match))
 mydf$zipcode <- as.numeric(str_detect(mydf$keyword, zipcode_match))


However, when trying to create 0/1 values for zipcodes, which I've entered
in as character strings,
I get the following error:

Error: invalid regular expression
'35004|35005|35006|35007|35010|35014|35016|35019|35020|


How can I generate binary 0/1 values for zip code values.


I'm using R 2.13 on Ubuntu 10.10

Abraham

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to