Re: [R] Populate one data frame with values from another dataframe for rows that match

Kevin Wamae Sat, 14 Oct 2017 03:27:35 -0700

Dear @Eric<mailto:[email protected]>, thank you so very much for noticing 
that. When I tested @Rui<mailto:[email protected]>’s solution, it was on a 
smaller dataset that had purely matching rows. I had considered including 
non-matching rows to evaluate what the alternative would be.

Also, I hadn’t even tested it on the larger dataset. I have now and noticed 
that it went further to omit rows that did not match, just like you said.

Your proposed solution works well.

Much appreciated…I’ll get in touch in case I encounter any problems

From: Eric Berger <[email protected]>
Date: Saturday, 14 October 2017 at 12:43
To: Kevin Wamae <[email protected]>
Cc: Rui Barradas <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: [R] Populate one data frame with values from another dataframe for 
rows that match

Hi Kevin,
I think there are issues with Rui's proposed solution. For example, if there 
are rows in myDF1 which have a studyno
which does not match any row in myDF2, then you will lose those rows. In your 
original request you said that you wanted to keep those rows.

To demonstrate my point I need to modify your sample data. Specifically, I 
changed some studyno settings in myDF1, and also the entries of pf_mcl in myDF1.

myDF1 <- structure(list(studyno = c("J1000/8", "J1000/9", "J1000/9", "J1000/9",
"J1000/5", "J1000/6"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(1:6
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

#Rui's proposal gives the following result
# studyno       date year pf_mcl
# 1 J1000/9 2016-11-22 2016      0
# 2 J1000/9 2016-11-30 2016      0
# 3 J1000/9 2016-12-09 2016      0

My proposal

library(dplyr)

myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
myDF3 <- merge(myDF1, myDF2, by="studyno", all.x=TRUE ) %>%
                dplyr::mutate( pf_mcl = ifelse( is.na<http://is.na>(pf_mcl.y), 
pf_mcl.x, pf_mcl.y ) ) %>%
                dplyr::select( studyno, date, pf_mcl )

# The results of this approach
#  studyno       date pf_mcl
# 1 J1000/5 2016-12-13      5
# 2 J1000/6 2016-12-20      6
# 3 J1000/8 2016-11-18      1
# 4 J1000/9 2016-11-22      0
# 5 J1000/9 2016-11-30      0
# 6 J1000/9 2016-12-09      0

Comparing the two results you see that no rows have been dropped in my approach.

HTH,

Eric

On Sat, Oct 14, 2017 at 8:49 AM, Kevin Wamae 
<[email protected]<mailto:[email protected]>> wrote:
Dear @Rui Barradas, thank you for the solution. It works perfectly.

On 13/10/2017, 23:35, "Rui Barradas" 
<[email protected]<mailto:[email protected]>> wrote:

    Hello,

    Try the following.

    myDF1$studyno <- as.character(myDF1$studyno)
    myDF2$studyno <- as.character(myDF2$studyno)
    i1 <- which(names(myDF1) == "pf_mcl")

    merge(myDF1[-i1], myDF2, by = "studyno")

    Hope this helps,

    Rui Barradas

    Em 13-10-2017 20:09, Kevin Wamae escreveu:
    > I'm trying to populate the column “pf_mcl” in myDF1 with values from 
myDF2, where rows match based on column "studyno" but the solutions I have 
found so far don't seem to be giving me the desired output.
    >
    > Below is a snapshot of the data.frames.
    >
    > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", 
"J1000/9",
    > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
    > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
    > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
    > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", 
"J931/6",
    > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = 
c("studyno",
    > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 is a well curated subset of myDF1. Some rows in the two datasets 
match based on "studyno", one may find that values are missing in myDF1$pf_mcl 
or the values are wrong.
    >
    > All I want to do is identify a matching row in myDF2 and populate 
myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on 
“studyno”, the value should remain the same.
    >
    > It's probably worth mentioning, the two data frames have other 
columns...I have selected a few for example purposes.
    >
    >
    >
    > ______________________________________________________________________
    >
    > This e-mail contains information which is confidential. It is intended 
only for the use of the named recipient. If you have received this e-mail in 
error, please let us know by replying to the sender, and immediately delete it 
from your system.  Please note, that in these circumstances, the use, 
disclosure, distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
    > ______________________________________________________________________
    >
    >   [[alternative HTML version deleted]]
    >
    > ______________________________________________
    > [email protected]<mailto:[email protected]> mailing list -- To 
UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    >

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________
______________________________________________
[email protected]<mailto:[email protected]> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for 
the use of the named recipient. If you have received this e-mail in error, 
please let us know by replying to the sender, and immediately delete it from 
your system.  Please note, that in these circumstances, the use, disclosure, 
distribution or copying of this information is strictly prohibited. 
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  
accuracy or completeness of this message as it has been transmitted over a 
public network. Although the Programme has taken reasonable precautions to 
ensure no viruses are present in emails, it cannot accept responsibility for 
any loss or damage arising from the use of the email or attachments. Any views 
expressed in this message are those of the individual sender, except where the 
sender specifically states them to be the views of KEMRI-Wellcome Trust 
Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Populate one data frame with values from another dataframe for rows that match

Reply via email to