> * Prof Brian Ripley <evc...@fgngf.bk.np.hx> [2012-10-08 06:37:07 +0100]: > > On 08/10/2012 02:57, Peter Ehlers wrote: >> On 2012-10-07 14:44, Sam Steingold wrote: >>>> * Peter Ehlers <ruy...@hpnytnel.pn> [2012-10-07 10:03:42 -0700]: >>>> >>>> On 2012-10-07 08:34, Sam Steingold wrote: >>>>> I know it does not look very good - using the same column names to mean >>>>> different things in different data frames, but here you go: >>>>> --8<---------------cut here---------------start------------->8--- >>>>>> x <- data.frame(a=c(1,2,3),b=c(4,5,6)) >>>>>> y <- data.frame(b=c(1,2),a=c("a","b")) >>>>>> merge(x,y,by.x="a",by.y="b",all.x=TRUE,suffixes=c("","y")) >>>>> a b a >>>>> 1 1 4 a >>>>> 2 2 5 b >>>>> 3 3 6 <NA> >>>>> Warning message: >>>>> In merge.data.frame(x, y, by.x = "a", by.y = "b", all.x = TRUE) : >>>>> column name 'a' is duplicated in the result >>>>> --8<---------------cut here---------------end--------------->8--- >>>>> why is the suffixes argument ignored? >>>>> I mean, I expected that the second "a" to be "a.y". >>>> >>>> The 'suffixes' argument refers to _non-by_ names only (as per ?merge). >>> >>> yes, but "a" in "y" is _not_ a by-name. >> >> Yes, it is. >> The set of by-names is the union of names specified by by.x and by.y, >> in your case: c("a", "b"). >> I suppose that a case could be made that ?merge does not spell that >> out sufficiently explicitly. > > It does in 'Details' (and where else would there be such a detail?) > E.g. in R 2.15.1: > > If the remaining columns in the data frames have any common names, > these have ‘suffixes’ (‘".x"’ and ‘".y"’ by default) appended to > try to make the names of the result unique. If this is not > possible, an error is thrown. > > Note *remaining*, and read what comes before that.
I read the docs and re-read them after seeing your message and, with all due respect, I fail to interpret them the way you do: The doc speaks about "columns to merge on", not "column names". I specify both by.x and by.y, thus I do not specify the column y$b. Note, however, that I do not want the doc fixed, I want the behavior modified. I see no advantage in the current behavior (a warning + duplicate column names) as opposed to the behavior I expected (renaming the column in the result to "b.y"). Thanks a lot for your kind replies and insight! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://americancensorship.org http://iris.org.il http://jihadwatch.org http://ffii.org http://truepeace.org Never argue with the person who is preparing your parachute. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.