Steven Lubitz wrote:
Hello, I'm switching over from SAS to R and am having trouble merging data
frames. The data frames have several columns with the same name, and each has a
different number of rows. Some of the values are missing from cells with the
same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in
a complete cell overwriting the value in an empty cell from the other data
frame. I cannot seem to achieve this result, though I've tried several merge
adaptations:
x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
merge(x,y,by="id") #I lose observations here (n=1 in this example), and my
items are duplicated - I do not want this result
id item1.x item2.x item1.y item2.y
1 1 NA 1 NA NA
2 2 NA NA 2 NA
3 3 3 NA NA 3
4 4 4 4 4 4
5 5 5 5 5 5
merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and
do not want this result
id item1 item2
1 4 4 4
2 5 5 5
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated
and the NA values are retained - I instead want one row per ID
id item1 item2
1 1 NA 1
2 1 NA NA
3 2 2 NA
4 2 NA NA
5 3 3 NA
6 3 NA 3
7 4 4 4
8 5 5 5
9 6 6 NA
You should obtain the desired solution using:
merge(y, x, by=c("id","item1","item2"), all=TRUE)
In database terminology all=TRUE corresponds to the full outer join,
all.x to the left outer join and all.y to the right outer join.
Ciao,
domenico
In reality I have multiple data frames with numerous columns, all with this
problem. I can do the merge seamlessly in SAS, but am trying to learn and stick
with R for my analyses. Any help would be greatly appreciated.
Steve Lubitz
Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts
General Hospital
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.