Re: [Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Scott Ritchie Sat, 17 Feb 2018 06:57:47 -0800

The attached patch.diff will make merge.data.frame() append the suffixes to
columns with common names between by.x and names(y).


Best,

Scott Ritchie

On 17 February 2018 at 11:15, Scott Ritchie <s.ritchi...@gmail.com> wrote:

> Hi Frederick,
>
> I would expect that any duplicate names in the resulting data.frame would
> have the suffixes appended to them, regardless of whether or not they are
> used as the join key. So in my example I would expect "names.x" and
> "names.y" to indicate their source data.frame.
>
> While careful reading of the documentation reveals this is not the case, I
> would argue the intent of the suffixes functionality should equally be
> applied to this type of case.
>
> If you agree this would be useful, I'm happy to write a patch for
> merge.data.frame that will add suffixes in this case - I intend to do the
> same for merge.data.table in the data.table package where I initially
> encountered the edge case.
>
> Best,
>
> Scott
>
> On 17 February 2018 at 03:53, <frede...@ofb.net> wrote:
>
>> Hi Scott,
>>
>> It seems like reasonable behavior to me. What result would you expect?
>> That the second "name" should be called "name.y"?
>>
>> The "merge" documentation says:
>>
>>     If the columns in the data frames not used in merging have any
>>     common names, these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default)
>>     appended to try to make the names of the result unique.
>>
>> Since the first "name" column was used in merging, leaving both
>> without a suffix seems consistent with the documentation...
>>
>> Frederick
>>
>> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote:
>> > Hi,
>> >
>> > I was unable to find a bug report for this with a cursory search, but
>> would
>> > like clarification if this is intended or unavoidable behaviour:
>> >
>> > ```{r}
>> > # Create example data.frames
>> > parents <- data.frame(name=c("Sarah", "Max", "Qin", "Lex"),
>> >                       sex=c("F", "M", "F", "M"),
>> >                       age=c(41, 43, 36, 51))
>> > children <- data.frame(parent=c("Sarah", "Max", "Qin"),
>> >                        name=c("Oliver", "Sebastian", "Kai-lee"),
>> >                        sex=c("M", "M", "F"),
>> >                        age=c(5,8,7))
>> >
>> > # Merge() creates a duplicated "name" column:
>> > merge(parents, children, by.x = "name", by.y = "parent")
>> > ```
>> >
>> > Output:
>> > ```
>> >    name sex.x age.x      name sex.y age.y
>> > 1   Max     M    43 Sebastian     M     8
>> > 2   Qin     F    36   Kai-lee     F     7
>> > 3 Sarah     F    41    Oliver     M     5
>> > Warning message:
>> > In merge.data.frame(parents, children, by.x = "name", by.y = "parent") :
>> >   column name ‘name’ is duplicated in the result
>> > ```
>> >
>> > Kind Regards,
>> >
>> > Scott Ritchie
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>
>

Index: src/library/base/R/merge.R
===================================================================
--- src/library/base/R/merge.R  (revision 74264)
+++ src/library/base/R/merge.R  (working copy)
@@ -157,6 +157,15 @@
         }
 
         if(has.common.nms) names(y) <- nm.y
+        ## If by.x %in% names(y) then duplicate column names still arise,
+        ## apply suffixes to these
+        dupe.keyx <- intersect(nm.by, names(y))
+        if(length(dupe.keyx)) {
+          if(nzchar(suffixes[1L]))
+            names(x)[match(dupe.keyx, names(x), 0L)] <- paste(dupe.keyx, 
suffixes[1L], sep="")
+          if(nzchar(suffixes[2L]))
+            names(y)[match(dupe.keyx, names(y), 0L)] <- paste(dupe.keyx, 
suffixes[2L], sep="")
+        }
         nm <- c(names(x), names(y))
         if(any(d <- duplicated(nm)))
             if(sum(d) > 1L)

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Duplicate column names created by base::merge() when by.x has the same name as a column in y

Reply via email to