The attached patch.diff will make merge.data.frame() append the suffixes to
columns with common names between by.x and names(y).
Best,
Scott Ritchie
On 17 February 2018 at 11:15, Scott Ritchie <[email protected]> wrote:
> Hi Frederick,
>
> I would expect that any duplicate names in the resulting data.frame would
> have the suffixes appended to them, regardless of whether or not they are
> used as the join key. So in my example I would expect "names.x" and
> "names.y" to indicate their source data.frame.
>
> While careful reading of the documentation reveals this is not the case, I
> would argue the intent of the suffixes functionality should equally be
> applied to this type of case.
>
> If you agree this would be useful, I'm happy to write a patch for
> merge.data.frame that will add suffixes in this case - I intend to do the
> same for merge.data.table in the data.table package where I initially
> encountered the edge case.
>
> Best,
>
> Scott
>
> On 17 February 2018 at 03:53, <[email protected]> wrote:
>
>> Hi Scott,
>>
>> It seems like reasonable behavior to me. What result would you expect?
>> That the second "name" should be called "name.y"?
>>
>> The "merge" documentation says:
>>
>> If the columns in the data frames not used in merging have any
>> common names, these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default)
>> appended to try to make the names of the result unique.
>>
>> Since the first "name" column was used in merging, leaving both
>> without a suffix seems consistent with the documentation...
>>
>> Frederick
>>
>> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote:
>> > Hi,
>> >
>> > I was unable to find a bug report for this with a cursory search, but
>> would
>> > like clarification if this is intended or unavoidable behaviour:
>> >
>> > ```{r}
>> > # Create example data.frames
>> > parents <- data.frame(name=c("Sarah", "Max", "Qin", "Lex"),
>> > sex=c("F", "M", "F", "M"),
>> > age=c(41, 43, 36, 51))
>> > children <- data.frame(parent=c("Sarah", "Max", "Qin"),
>> > name=c("Oliver", "Sebastian", "Kai-lee"),
>> > sex=c("M", "M", "F"),
>> > age=c(5,8,7))
>> >
>> > # Merge() creates a duplicated "name" column:
>> > merge(parents, children, by.x = "name", by.y = "parent")
>> > ```
>> >
>> > Output:
>> > ```
>> > name sex.x age.x name sex.y age.y
>> > 1 Max M 43 Sebastian M 8
>> > 2 Qin F 36 Kai-lee F 7
>> > 3 Sarah F 41 Oliver M 5
>> > Warning message:
>> > In merge.data.frame(parents, children, by.x = "name", by.y = "parent") :
>> > column name ‘name’ is duplicated in the result
>> > ```
>> >
>> > Kind Regards,
>> >
>> > Scott Ritchie
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>
>
Index: src/library/base/R/merge.R
===================================================================
--- src/library/base/R/merge.R (revision 74264)
+++ src/library/base/R/merge.R (working copy)
@@ -157,6 +157,15 @@
}
if(has.common.nms) names(y) <- nm.y
+ ## If by.x %in% names(y) then duplicate column names still arise,
+ ## apply suffixes to these
+ dupe.keyx <- intersect(nm.by, names(y))
+ if(length(dupe.keyx)) {
+ if(nzchar(suffixes[1L]))
+ names(x)[match(dupe.keyx, names(x), 0L)] <- paste(dupe.keyx,
suffixes[1L], sep="")
+ if(nzchar(suffixes[2L]))
+ names(y)[match(dupe.keyx, names(y), 0L)] <- paste(dupe.keyx,
suffixes[2L], sep="")
+ }
nm <- c(names(x), names(y))
if(any(d <- duplicated(nm)))
if(sum(d) > 1L)
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel