Re: [R] merge data frames with same column names of different lengths and missing values

Jun Shen Sat, 07 Mar 2009 13:53:18 -0800

Steve,

I don't know if R has such a function to perform the task you were asking. I
wrote one myself. Try the following to see if it works for you. The new
function "merge.new" has one additional argument col.ID, which is the column
number of ID column. To use your x, y as examples, type:


merge.new(x,y,all=TRUE,col.ID=3)

#################################################

merge.new<-function(...,col.ID){
    inter<-merge(...)
    inter<-inter[order(inter[col.ID]),] #merged data sorted by ID

    #total columns and rows for the target dataframe
    total.row<-length(unique(inter[[col.ID]]))
    total.col<-dim(inter)[2]
    row.ID<-unique(inter[[col.ID]])
    target<-matrix(NA,total.row,total.col)
    target<-as.data.frame(target)
    names(target)<-names(inter)

    for (i in 1:total.row){
        inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with
the same ID
        for (j in 1:total.col){
            if (is.na(inter.part[1,j])){
                if(is.na(inter.part[2,j])) {target[i,j]=NA}
                else {target[i,j]=inter.part[2,j]}
            }
            else {target[i,j]=inter.part[1,j]}

        }
    }
print(paste("total rows=",total.row))
print(paste("total columns=",total.col))
return(target)
}
#################################################
-- 
Jun Shen PhD
PK/PD Scientist
BioPharma Services
Millipore Corporation
15 Research Park Dr.
St Charles, MO 63304
Direct: 636-720-1589

On Fri, Mar 6, 2009 at 11:02 PM, Steven Lubitz <slubi...@yahoo.com> wrote:

>
> Hello, I'm switching over from SAS to R and am having trouble merging data
> frames. The data frames have several columns with the same name, and each
> has a different number of rows. Some of the values are missing from cells
> with the same column names in each data frame. I had hoped that when I
> merged the dataframes, every column with the same name would be merged, with
> the value in a complete cell overwriting the value in an empty cell from the
> other data frame. I cannot seem to achieve this result, though I've tried
> several merge adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this example), and my
> items are duplicated - I do not want this result
>  id item1.x item2.x item1.y item2.y
> 1  1      NA       1      NA      NA
> 2  2      NA      NA       2      NA
> 3  3       3      NA      NA       3
> 4  4       4       4       4       4
> 5  5       5       5       5       5
>
>
> merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here)
> and do not want this result
>  id item1 item2
> 1  4     4     4
> 2  5     5     5
>
>
> merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are
> duplicated and the NA values are retained - I instead want one row per ID
>  id item1 item2
> 1  1    NA     1
> 2  1    NA    NA
> 3  2     2    NA
> 4  2    NA    NA
> 5  3     3    NA
> 6  3    NA     3
> 7  4     4     4
> 8  5     5     5
> 9  6     6    NA
>
> In reality I have multiple data frames with numerous columns, all with this
> problem. I can do the merge seamlessly in SAS, but am trying to learn and
> stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
> Massachusetts General Hospital
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of different lengths and missing values

Reply via email to