Make current_location and previous_location factors with the same set of levels. The levels could be the union of the values in the two columns or a predetermined list. E.g.,
> x <- data.frame(previous_location=c("Mount Vernon","Burlington"), current_location=c("Sedro Woolley","Burlington")) > allCities <- levels(factor(unlist(x))) # union of observed values > allCities [1] "Burlington" "Mount Vernon" "Sedro Woolley" > x[] <- lapply(x, factor, levels=allCities) > xtabs(~previous_location + current_location,data=x) current_location previous_location Burlington Mount Vernon Sedro Woolley Burlington 1 0 0 Mount Vernon 0 0 1 Sedro Woolley 0 0 0 or, using an externally determined set of cities > allCities <- c("Anacortes","Burlington","Concrete","Mount Vernon","Sedro Woolley") > x[] <- lapply(x, factor, levels=allCities) > xtabs(~previous_location + current_location,data=x) current_location previous_location Anacortes Burlington Concrete Mount Vernon Sedro Woolley Anacortes 0 0 0 0 0 Burlington 0 1 0 0 0 Concrete 0 0 0 0 0 Mount Vernon 0 0 0 0 1 Sedro Woolley 0 0 0 0 0 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, May 16, 2018 at 7:49 AM, Miluji Sb <miluj...@gmail.com> wrote: > Dear Bert and Huzefa, > > Apologies for the late reply, my account got hacked and I have just managed > to recover it. > > Thank you very much for your replies and the solutions. Both work well. > > I was wondering if there was any way to ensure (force) that all possible > combinations show up in the output. The full dataset has 25 cities but of > course people have not moved from Boston to all the other 24 cities. I > would like all the combinations if possible. > > Thank you again! > > Sincerely, > > Milu > > On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <bgunter.4...@gmail.com> > wrote: > > > or in base R : ?xtabs ?? > > > > as in: > > xtabs(~previous_location + current_location,data=x) > > > > (You can convert the 0s to NA's if you like) > > > > > > Cheers, > > Bert > > > > > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <huzefa.kha...@umich.edu> > > wrote: > > > >> Dear Miluji, > >> > >> If I understand correctly, this should get you what you need. > >> > >> temp1 <- > >> structure(list(id = 101:115, current_location = structure(c(2L, > >> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label = > >> c("Austin", > >> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", > >> "New York"), class = "factor"), previous_location = structure(c(6L, > >> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label = > >> c("Atlanta", > >> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" > >> ), class = "factor")), class = "data.frame", row.names = c(NA, > >> -15L)) > >> > >> dcast(temp1, previous_location ~ current_location) > >> > >> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <miluj...@gmail.com> wrote: > >> > I have data on current and previous location of individuals. I would > >> like > >> > to have a matrix with bilateral movement between locations. I would > like > >> > the final output to look like the second table below. > >> > > >> > I have tried using crosstab() from the ecodist but I do not have > another > >> > variable to measure the flow. Ultimately I would like to compute the > >> > probability of movement between cities (movement to city_i/total > >> movement > >> > from city_j). > >> > > >> > Is it possible to aggregate the data in this way? Any guidance would > be > >> > highly appreciated. Thank you! > >> > > >> > # Original data > >> > structure(list(id = 101:115, current_location = structure(c(2L, > >> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label = > >> > c("Austin", > >> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans", > >> > "New York"), class = "factor"), previous_location = structure(c(6L, > >> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label = > >> > c("Atlanta", > >> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa" > >> > ), class = "factor")), class = "data.frame", row.names = c(NA, > >> > -15L)) > >> > > >> > # Expected output > >> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin", > >> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA), > >> > New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class = > >> > "data.frame", row.names = c(NA, > >> > -3L)) > >> > > >> > Sincerely, > >> > > >> > Milu > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide http://www.R-project.org/posti > >> ng-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posti > >> ng-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.