On Jan 3, 2015, at 9:20 PM, npretnar wrote: > Sorry. Bad example on my part. Try this. V1 is ... > > V1 > alabama > bates > tuscaloosa > smith > arkansas > fayette > little rock > alaska > juneau > nome > > And I want: > > V1 V2 > alabama bates > alabama tuscaloosa > alabama smith > arkansas fayette > arkansas little rock > alaska juneau > alaskas nome
dat$is_state <- grepl(tolower(paste(state.name, collapse="|")), dat$V1) dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) ) dat2 <- data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] , V2 = dat$V1[ !dat$is_state] ) > dat2 V1 V2 1 alabama bates 2 alabama tuscaloosa 3 alabama smith 4 arkansas fayette 5 arkansas little 6 arkansas rock 7 alaska juneau 8 alaska nome -- David. > > This is more representative of the problem, extended to all 50 states. > > - Nick > > > On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote: > >> I'm not sure what's so complicated about that (am I missing >> something?). You can search using grep, and replace using gsub, so >> >> tmpDF <- read.table(text="V1 V2 >> A 5 >> a1 1 >> a2 1 >> a3 1 >> a4 1 >> a5 1 >> B 4 >> b1 1 >> b2 1 >> b3 1 >> b4 1", >> header=TRUE) >> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ] >> data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1))) >> >> Seems to do the trick. >> >> Best, >> Ista >> >> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npret...@gmail.com> wrote: >>> I have a string variable (V1) in a data frame structured as follows: >>> >>> V1 V2 >>> A 5 >>> a1 1 >>> a2 1 >>> a3 1 >>> a4 1 >>> a5 1 >>> B 4 >>> b1 1 >>> b2 1 >>> b3 1 >>> b4 1 >>> >>> I want the following: >>> >>> V1 V2 V3 >>> a1 1 A >>> a2 1 A >>> a3 1 A >>> a4 1 A >>> a5 1 A >>> b1 1 B >>> b2 1 B >>> b3 1 B >>> b4 1 B >>> >>> I am not sure how to go about making this transformation besides writing a >>> long vector that contains each of the categorical string names (these are >>> state names, so it would be a really long vector). Any help would be >>> greatly appreciated. >>> >>> Thanks, >>> >>> Nicholas Pretnar >>> Mizzou Economics Grad Assistant >>> npret...@gmail.com David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.