The attached patch.diff will make merge.data.frame() append the suffixes to columns with common names between by.x and names(y).
Best, Scott Ritchie On 17 February 2018 at 11:15, Scott Ritchie <s.ritchi...@gmail.com> wrote: > Hi Frederick, > > I would expect that any duplicate names in the resulting data.frame would > have the suffixes appended to them, regardless of whether or not they are > used as the join key. So in my example I would expect "names.x" and > "names.y" to indicate their source data.frame. > > While careful reading of the documentation reveals this is not the case, I > would argue the intent of the suffixes functionality should equally be > applied to this type of case. > > If you agree this would be useful, I'm happy to write a patch for > merge.data.frame that will add suffixes in this case - I intend to do the > same for merge.data.table in the data.table package where I initially > encountered the edge case. > > Best, > > Scott > > On 17 February 2018 at 03:53, <frede...@ofb.net> wrote: > >> Hi Scott, >> >> It seems like reasonable behavior to me. What result would you expect? >> That the second "name" should be called "name.y"? >> >> The "merge" documentation says: >> >> If the columns in the data frames not used in merging have any >> common names, these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default) >> appended to try to make the names of the result unique. >> >> Since the first "name" column was used in merging, leaving both >> without a suffix seems consistent with the documentation... >> >> Frederick >> >> On Fri, Feb 16, 2018 at 09:08:29AM +1100, Scott Ritchie wrote: >> > Hi, >> > >> > I was unable to find a bug report for this with a cursory search, but >> would >> > like clarification if this is intended or unavoidable behaviour: >> > >> > ```{r} >> > # Create example data.frames >> > parents <- data.frame(name=c("Sarah", "Max", "Qin", "Lex"), >> > sex=c("F", "M", "F", "M"), >> > age=c(41, 43, 36, 51)) >> > children <- data.frame(parent=c("Sarah", "Max", "Qin"), >> > name=c("Oliver", "Sebastian", "Kai-lee"), >> > sex=c("M", "M", "F"), >> > age=c(5,8,7)) >> > >> > # Merge() creates a duplicated "name" column: >> > merge(parents, children, by.x = "name", by.y = "parent") >> > ``` >> > >> > Output: >> > ``` >> > name sex.x age.x name sex.y age.y >> > 1 Max M 43 Sebastian M 8 >> > 2 Qin F 36 Kai-lee F 7 >> > 3 Sarah F 41 Oliver M 5 >> > Warning message: >> > In merge.data.frame(parents, children, by.x = "name", by.y = "parent") : >> > column name ‘name’ is duplicated in the result >> > ``` >> > >> > Kind Regards, >> > >> > Scott Ritchie >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > >
Index: src/library/base/R/merge.R =================================================================== --- src/library/base/R/merge.R (revision 74264) +++ src/library/base/R/merge.R (working copy) @@ -157,6 +157,15 @@ } if(has.common.nms) names(y) <- nm.y + ## If by.x %in% names(y) then duplicate column names still arise, + ## apply suffixes to these + dupe.keyx <- intersect(nm.by, names(y)) + if(length(dupe.keyx)) { + if(nzchar(suffixes[1L])) + names(x)[match(dupe.keyx, names(x), 0L)] <- paste(dupe.keyx, suffixes[1L], sep="") + if(nzchar(suffixes[2L])) + names(y)[match(dupe.keyx, names(y), 0L)] <- paste(dupe.keyx, suffixes[2L], sep="") + } nm <- c(names(x), names(y)) if(any(d <- duplicated(nm))) if(sum(d) > 1L)
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel