I think you just need merge(), e.g.

a <- data.frame(id = rep(1:3, each=3), val = rnorm(9))
b <- data.frame(id = 1:3, set1 = LETTERS[1:3], set2 = 5:7)

merge(a, b, by = "id")


I hope it helps.

Best,
Dimitris


On 8/9/2010 11:01 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote:
Hi all,

Suppose that I've two data frames, a and b say, both containing a column
'id'. While data frame 'a' contains multiple rows sharing the same id,
data frame 'b' contains just one entry per id (i.e. a 1 to n
relationship). For the ease of modeling I now want to generate a new
data frame c, which is basically a copy of data frame 'a' augmented by
the values of b. If I have

a<- data.frame(id = rep(1:3, each=3), val=rnorm(9))
b<- data.frame(id=1:3, set1=LETTERS[1:3], set2=5:7)

the resulting data frame should look like:

c<- data.frame(id = rep(1:3, each=3), val = a$val,
set1=rep(LETTERS[1:3], each=3), set2 = rep(5:7, each = 3))

While this task is just an application of some 'rep's and 'c's for
structured data frames, it is somehow cumbersome (and error prone) to
construct 'c' explicitly for less structured data. Thus, I was thinking
of making use of R's smart indexing possibilities to generate an index
vector, i.e.:

ind<- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
c.prime<- cbind(a, b[ind,-1])
rownames(c.prime)<- NULL
all.equal(c.prime , c) # TRUE

The way I generate the index vector ind for the moment is

tmp<- seq_along(b$id)
names(tmp)<- b$id
ind<- tmp[a$id]

However, I think that there should be a smarter way of doing that
without the need of defining a temporary variable. Some combination of
match, which, %in% maybe? Any hints?

While writing these lines, I think

ind<- pmatch(a$id, b$id, duplicates=T)

could do the job? Or do I run into troubles regarding the "partial
matching" involved in pmatch?

BTW, is there a way to prevent R of assigning [row|col]names? In the
example above I had to remove the rownames generated by rbind
explicitly, is there an one-liner?

Thanks for your input + BR

Thorn

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to