Dear List: I have a question related to a previous discussion How to sum one column in a data frame keyed on other columns https://stat.ethz.ch/pipermail/r-help/2006-December/122141.html (George Nachman, Bill Venables)
The original query was to calculate the sum of visits for each unique tuple of (url, time) from the data frame dat: > dat url time somethingirrelevant visits 1 www.foo.com 1:00 xxx 100 2 www.foo.com 1:00 yyy 50 3 www.foo.com 2:00 xyz 25 4 www.bar.com 1:00 xxx 200 5 www.bar.com 1:00 zzz 200 6 www.foo.com 2:00 xxx 500 The response gave: > tdat url time total_visits 4 www.bar.com 1:00 400 1 www.foo.com 1:00 150 3 www.foo.com 2:00 525 My question is how can I build a similar data frame but having also rows for combinations of (url, time) that were not in dat? In this example, the (url, time) without record of visit would be (www.bar.com, 2:00): > ndat url time total_visits www.bar.com 1:00 400 www.bar.com 2:00 0 www.foo.com 1:00 150 www.foo.com 2:00 525 I have tried to build a data frame with > adat <- data.frame (url = rep(unique(dat$url), each=length(unique(dat$time))), time=unique(dat$time), alt_visits=0) > ndat <- merge (adat, tdat, by=c("url", "time"), all = TRUE) then replace the NA and remove the alt_visits column. But this appears clumsy and quite slow with 10000 rows and 4 columns, I would be most grateful for other suggestions, Vincent ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.