I'm coming to R from Python, so I coded a Python3 solution: ##################### data = """alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome """.split()
state_list = ["alabama", "arkansas", "alaska"] # etc. return_list = [] for word in data: if word in state_list: current_state = word else: return_list.append([current_state, word]) print(return_list) ##################### ... and then translated it to R: ##################### data = "alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome " data = strsplit(data, split="\n")[[1]] states = vector() cities = vector() for (word in data) { if (word %in% tolower(state.name)) { current_state = word } else { states = c(states, current_state) cities = c(cities, word) } } print(data.frame(V1=states, V2=cities)) ##################### -John > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David > Winsemius > Sent: Sunday, January 04, 2015 2:48 AM > To: npretnar > Cc: R-help@r-project.org > Subject: Re: [R] Separating a Complicated String Vector > > > On Jan 3, 2015, at 9:20 PM, npretnar wrote: > > > Sorry. Bad example on my part. Try this. V1 is ... > > > > V1 > > alabama > > bates > > tuscaloosa > > smith > > arkansas > > fayette > > little rock > > alaska > > juneau > > nome > > > > And I want: > > > > V1 V2 > > alabama bates > > alabama tuscaloosa > > alabama smith > > arkansas fayette > > arkansas little rock > > alaska juneau > > alaskas nome > > > dat$is_state <- grepl(tolower(paste(state.name, collapse="|")), dat$V1) > > dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) ) > dat2 <- data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] > , > V2 = dat$V1[ !dat$is_state] ) > > > > dat2 > V1 V2 > 1 alabama bates > 2 alabama tuscaloosa > 3 alabama smith > 4 arkansas fayette > 5 arkansas little > 6 arkansas rock > 7 alaska juneau > 8 alaska nome > > -- > David. > > > > > This is more representative of the problem, extended to all 50 states. > > > > - Nick > > > > > > On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote: > > > >> I'm not sure what's so complicated about that (am I missing > >> something?). You can search using grep, and replace using gsub, so > >> > >> tmpDF <- read.table(text="V1 V2 > >> A 5 > >> a1 1 > >> a2 1 > >> a3 1 > >> a4 1 > >> a5 1 > >> B 4 > >> b1 1 > >> b2 1 > >> b3 1 > >> b4 1", > >> header=TRUE) > >> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ] data.frame(tmpDF, V3 = > >> toupper(gsub("[0-9]", "", tmpDF$V1))) > >> > >> Seems to do the trick. > >> > >> Best, > >> Ista > >> > >> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npret...@gmail.com> wrote: > >>> I have a string variable (V1) in a data frame structured as follows: > >>> > >>> V1 V2 > >>> A 5 > >>> a1 1 > >>> a2 1 > >>> a3 1 > >>> a4 1 > >>> a5 1 > >>> B 4 > >>> b1 1 > >>> b2 1 > >>> b3 1 > >>> b4 1 > >>> > >>> I want the following: > >>> > >>> V1 V2 V3 > >>> a1 1 A > >>> a2 1 A > >>> a3 1 A > >>> a4 1 A > >>> a5 1 A > >>> b1 1 B > >>> b2 1 B > >>> b3 1 B > >>> b4 1 B > >>> > >>> I am not sure how to go about making this transformation besides > writing a long vector that contains each of the categorical string names > (these > are state names, so it would be a really long vector). Any help would be > greatly appreciated. > >>> > >>> Thanks, > >>> > >>> Nicholas Pretnar > >>> Mizzou Economics Grad Assistant > >>> npret...@gmail.com > > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.