Subject: Re: [R] merge data frames with same column names of different lengths and missing values To: "Phil Spector" <spec...@stat.berkeley.edu> Date: Saturday, March 7, 2009, 5:01 PM
Phil, Thank you - this is very helpful. However I realized that with my real data sets (not the example I have here), I also have different numbers of columns in each data frame. rbind doesn't seem to like this. Here's a modified example: x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), item3=c(NA,2,NA,4,NA), id=1:5) y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) rbind(x,y) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Any ideas? Thanks, Steve --- On Sat, 3/7/09, Phil Spector <spec...@stat.berkeley.edu> wrote: From: Phil Spector <spec...@stat.berkeley.edu> Subject: Re: [R] merge data frames with same column names of different lengths and missing values To: "Steven Lubitz" <slubi...@yahoo.com> Date: Saturday, March 7, 2009, 1:56 AM Steven - I believe this gives the output that you desire: > xy = rbind(x,y) > aggregate(subset(xy,select=-id),xy['id'],function(x)rev(x[!is.na(x)])[1]) id item1 item2 1 1 NA 1 2 2 2 NA 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 NA But I think what merge x y; by id; would give you is > aggregate(subset(xy,select=-id),xy['id'],function(x)x[length(x)]) id item1 item2 1 1 NA NA 2 2 2 NA 3 3 NA 3 4 4 4 4 5 5 5 5 6 6 6 NA - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 6 Mar 2009, Steven Lubitz wrote: > > Hello, I'm switching over from SAS to R and am having trouble merging data frames. The data frames have several columns with the same name, and each has a different number of rows. Some of the values are missing from cells with the same column names in each data frame. I had hoped that when I merged the dataframes, every column with the same name would be merged, with the value in a complete cell overwriting the value in an empty cell from the other data frame. I cannot seem to achieve this result, though I've tried several merge adaptations: > > x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5) > y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6) > > > merge(x,y,by="id") #I lose observations here (n=1 in this example), and my items are duplicated - I do not want this result > id item1.x item2.x item1.y item2.y > 1 1 NA 1 NA NA > 2 2 NA NA 2 NA > 3 3 3 NA NA 3 > 4 4 4 4 4 4 > 5 5 5 5 5 5 > > > merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and do not want this result > id item1 item2 > 1 4 4 4 > 2 5 5 5 > > > merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated and the NA values are retained - I instead want one row per ID > id item1 item2 > 1 1 NA 1 > 2 1 NA NA > 3 2 2 NA > 4 2 NA NA > 5 3 3 NA > 6 3 NA 3 > 7 4 4 4 > 8 5 5 5 > 9 6 6 NA > > In reality I have multiple data frames with numerous columns, all with this problem. I can do the merge seamlessly in SAS, but am trying to learn and stick with R for my analyses. Any help would be greatly appreciated. > > Steve Lubitz > Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts General Hospital > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.