Re: [R] merge: right set overwrite left set

Jeff Newmiller Sun, 12 Jul 2015 12:39:06 -0700

I get confused by your use of the position map table. If I follow yourdescription toward your desired result, I take a different route thatmakes sense to me. Perhaps it will make sense to you as well. The key ideais to make individual comparisons of the values for each combination ofHHid and indv, regardless of where they are in the original data frames.

Below are two different syntactic representations of my understanding ofyour problem. They differ only in the approach taken to strip away syntaxclutter. I start by making the HHid identifiers character values in theoriginal data frames, because their respective factor levels in the twodata frames would not necessarily correspond.


x.HHu <- data.frame(
    HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10' )
  , indv1 = c( 2, 0, 2 , 0, 2, 0 )
  , indv2 = c( 0, NA, 2, 2, 2, 2 )
  , ind3 = c( 0, 0, 0, 0, 0, 0 )
  , stringsAsFactors = FALSE # avoid creating HHid as a factor
)
y.HHo <- data.frame(
    HHid=c( 'HH1', 'HH2','HH5', 'HH3', 'HH10' )
  , indv1 = c( 2, 0, 2, 0, NA )
  , indv2 = c( 0, 2, 2, 1, 2 )
  , stringsAsFactors = FALSE
)
z.map <- data.frame(
    HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5'
            , 'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11' )
  , position= c( 10, 20, 30, 42, 55, 66, 81, 75, 92, 101, 111 )
  , stringsAsFactors = FALSE
)


# reshape2 solution
library(reshape2)

x.HHu.long <- melt( x.HHu, "HHid", variable.name = "indv" )
x.HHu.long$indv <- as.character( x.HHu.long$indv )
y.HHo.long <- melt( y.HHo, "HHid", variable.name = "indv" )
y.HHo.long$indv <- as.character( y.HHo.long$indv )
xy.HH.long <- merge( x.HHu.long
                   , y.HHo.long
                   , by = c( "HHid", "indv" )
                   , all = TRUE )
xy.HH.long$value <- with( xy.HH.long
                        , ifelse( is.na( value.y )
                                , value.x
                                , value.y ) )
xy.HH0 <- dcast( xy.HH.long, HHid ~ indv )
xy.HH <- merge( xy.HH0, z.map, all=TRUE )
xy.HH <- xy.HH[ order( xy.HH$HHid ), ]

# compare xy.HH with zzz ... I think there is an error in zzz for# HH10/indv1, because NA should not be considered more informative

# than 0...

# tidyr/dplyr solution
library(tidyr)
library(dplyr)

# define a common processing sequence
lengthen <- (   . # period is placeholder for data frame
            %>% gather( indv, value, -HHid )
            %>% mutate( indv = as.character( indv ) )
            )
x.HHu.dlong <- x.HHu %>% lengthen
y.HHo.dlong <- y.HHo %>% lengthen
xy.HH.d <- (   x.HHu.dlong
           %>% full_join( y.HHo.dlong, by= c( "HHid", "indv" ) )
           %>% transmute( HHid = HHid
                        , indv = indv
                        , value = ifelse( is.na( value.y )
                                        , value.x
                                        , value.y )
                        )
           %>% spread( indv, value )
           %>% full_join( z.map, by="HHid" )
           %>% arrange( HHid )
           %>% as.data.frame
           )

On Sun, 12 Jul 2015, aldi wrote:

Hi,
I have two sets of data x.HHu and y.HHo, rows are IDs and columns are
individuals. I do not know in advance indv or HHid, both of them will be
captured from the data. As the y.HHo set updates, y.HHo set has better
information then x.HHu set. Thus I want a merge where right set
overwrites left set info based on HHid, i.e. to overwrite x.HHu set with
y.HHo set but keep any extra info from the x.HHu set that is not present
in y.HHo set.
HHids will be complete based on z.map, with the corresponding positions.
I am having trouble with the part after this line: ###
============================================+++++++++++++++++++++++++++
I am thinking that I am creating new columns "position" "indv1" and
"indv2", but R is interpreting them as row information.
See the expected final table at the end. HHid is common, indv3 is from
x.HHu, and the rest position and indv1 and indv2 are from y.HHo
Any suggestions are appreciated.
Thank you in advance,
Aldi

x.HHu<- data.frame(
           HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10')
         , indv1 = c( 2, 0, 2 , 0, 2, 0)
         , indv2 = c( 0, NA, 2, 2, 2, 2)
         , ind3 = c( 0, 0, 0, 0, 0, 0)
         )
### the HHo data will be the top set to overwrite any HHu data, when
they exist, thinking that HHo are better than HHu results
### when they are available

y.HHo<-data.frame(HHid=c('HH1', 'HH2','HH5', 'HH3', 'HH10')
         , indv1 = c(2, 0, 2, 0, NA)
         , indv2 = c(0, 2, 2, 1, 2)
         )

z.map<-data.frame(HHid = c('HH1', 'HH2', 'HH3', 'HH4', 'HH5',
'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11')
                , position= c(10,20,30,42,55,66,81,75,92,101,111)
                )
### see objects
x.HHu
y.HHo
z.map
### now sort the map by position, this sorted map will be used to sort
finally all data
z.map<-z.map[with(z.map, order(position)), ]
z.map

### First I introduce position to both sets so I can sort them in
advance by position.
x.HHu.map <-merge( z.map, x.HHu, by='HHid', all=T)
x.HHu.map<-x.HHu.map[with(x.HHu.map, order(position)), ]
x.HHu.map

y.HHo.map <-merge( z.map, y.HHo, by='HHid', all= T)
y.HHo.map<-y.HHo.map[with(y.HHo.map, order(position)), ]
y.HHo.map

### now merge HHu  and HHo  with the hope to overwrite the HHu set with
HHo wherever they overlap by column names.
zzz <- merge(x.HHu.map, y.HHo.map, by='HHid', all=T)
zzz
### find common variable names in two sets

commonNames <- names(x.impu.map)[which(colnames(x.impu.map) %in%
colnames(y.geno.map))]

## remove HHid wich is common for x and y, but work with the rest of columns
commonNames<-commonNames[-c(1)]

### ============================================+++++++++++++++++++++++++++
for(i in 1:length(commonNames)){

print(commonNames[i])
zzz$commonNames[i] <- NA

print(paste("zzz","$",commonNames[i],".y",sep=""))

zzz$commonNames[i] <- zzz[,paste(commonNames[i],".y",sep="")]

### paste(zzz$commonNames[i],".x",sep='') <- NULL;
### paste(zzz$commonNames[i],".y",sep='') <- NULL;

}
zzz

The final expected set has to be: HHid is common, indv3 is from x.HHu,
and the rest position and indv1 and indv2 are from y.HHo
   HHid     position     ind3  indv1 indv2
1   HH1         10          0     2       0
2  HH10        101          0    NA       2
3  HH11        111         NA    NA      NA
4   HH2         20          0     0       2
5   HH3         30          0     0       1
6   HH4         42          0    NA      NA
7   HH5         55          0     2       2
8   HH6         66         NA    NA      NA
9   HH7         75         NA    NA      NA
10  HH8         81         NA    NA      NA
11  HH9         92         NA    NA      NA

--


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge: right set overwrite left set

Reply via email to