Hi all, Suppose that I've two data frames, a and b say, both containing a column 'id'. While data frame 'a' contains multiple rows sharing the same id, data frame 'b' contains just one entry per id (i.e. a 1 to n relationship). For the ease of modeling I now want to generate a new data frame c, which is basically a copy of data frame 'a' augmented by the values of b. If I have
a <- data.frame(id = rep(1:3, each=3), val=rnorm(9)) b <- data.frame(id=1:3, set1=LETTERS[1:3], set2=5:7) the resulting data frame should look like: c <- data.frame(id = rep(1:3, each=3), val = a$val, set1=rep(LETTERS[1:3], each=3), set2 = rep(5:7, each = 3)) While this task is just an application of some 'rep's and 'c's for structured data frames, it is somehow cumbersome (and error prone) to construct 'c' explicitly for less structured data. Thus, I was thinking of making use of R's smart indexing possibilities to generate an index vector, i.e.: ind <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) c.prime <- cbind(a, b[ind,-1]) rownames(c.prime) <- NULL all.equal(c.prime , c) # TRUE The way I generate the index vector ind for the moment is tmp <- seq_along(b$id) names(tmp) <- b$id ind <- tmp[a$id] However, I think that there should be a smarter way of doing that without the need of defining a temporary variable. Some combination of match, which, %in% maybe? Any hints? While writing these lines, I think ind <- pmatch(a$id, b$id, duplicates=T) could do the job? Or do I run into troubles regarding the "partial matching" involved in pmatch? BTW, is there a way to prevent R of assigning [row|col]names? In the example above I had to remove the rownames generated by rbind explicitly, is there an one-liner? Thanks for your input + BR Thorn ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.