Subject: Re: [R] merge data frames with same column names of different lengths 
and missing values
To: "Phil Spector" <spec...@stat.berkeley.edu>
Date: Saturday, March 7, 2009, 5:01 PM

Phil,
Thank you - this is very helpful. However I realized that with my real data 
sets (not the example I have here), I also have different numbers of columns in 
each data frame. rbind doesn't seem to like this. Here's a modified example:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), 
item3=c(NA,2,NA,4,NA), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)

rbind(x,y)
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Any ideas?
Thanks,
Steve


--- On Sat, 3/7/09, Phil Spector <spec...@stat.berkeley.edu> wrote:
From: Phil Spector <spec...@stat.berkeley.edu>
Subject: Re: [R] merge data frames with
 same column names of different lengths and missing values
To: "Steven Lubitz" <slubi...@yahoo.com>
Date: Saturday, March 7, 2009, 1:56 AM

Steven -
  I believe this gives the output that you desire:

> xy = rbind(x,y)
>
aggregate(subset(xy,select=-id),xy['id'],function(x)rev(x[!is.na(x)])[1])
   id item1 item2
1  1    NA     1
2  2     2    NA
3  3     3     3
4  4     4     4
5  5     5     5
6  6     6    NA

  But I think what merge x y; by id; would give you is

> aggregate(subset(xy,select=-id),xy['id'],function(x)x[length(x)])
   id item1 item2
1  1    NA    NA
2  2     2    NA
3  3    NA     3
4  4     4     4
5  5     5     5
6  6     6    NA

                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                        
 spec...@stat.berkeley.edu





On Fri, 6 Mar 2009, Steven Lubitz wrote:

>
> Hello, I'm switching over from SAS to R and am having trouble merging
data frames. The data frames have several columns with the same name, and each
has a different number of rows. Some of the values are missing from cells with
the same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in a
complete cell overwriting the value in an empty cell from the other data frame.
I cannot seem to achieve this result, though I've tried several merge
adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA),
id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this
example), and my items are
 duplicated - I do not want this result
>  id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #again
I lose observations (n=4 here) and do not want this result
>  id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
>
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated and the NA values are retained - I instead want one row
per ID
>  id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>
> In reality I
 have multiple data frames with numerous columns, all with
this problem. I can do the merge seamlessly in SAS, but am trying to learn and
stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
Massachusetts General Hospital
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



      


      
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to