On Sun, Mar 18, 2012 at 12:48 PM, Steve Lianoglou < mailinglist.honey...@gmail.com> wrote:
> Hi Uwe, > > 2012/3/17 Uwe Ligges <lig...@statistik.tu-dortmund.de>: > > > > > > On 15.03.2012 22:48, Matthew Dowle wrote: > >> > >> > >> Anyone? > >> > >>> Is it intended that the first suffix can no longer be blank? Seems to > be > >>> caused by a bug fix to merge in R 2.15.0. > > > > > > > > Right, the user is now protected against confusing himself by using names > > that were not unique before the merge. > > ... now I'm confused :-) > > If the user explicitly asks for a NULL/0/empty/whatever suffix, > they're not really going to be confusing themselves, right? > If the user asks for a blank suffix and you still give back ".x" or ".y" as a suffix, then yes that is confusing. > I actually feel like I do this often, where "this" is explicitly > asking to not add a suffix to one group of columns ... I do confuse > myself every and now and again, but not in this context, yet. > > I can see that *this* confusing case is now handled w/ this change > (which wasn't before): > > ## I'm using R-devel compiled back in November, 2011 (r57571) > R> d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10)) > R> d2 <- data.frame(a=letters[1:10], b=101:110) > R> merge(d1, d2, by='a', suffixes=c('.x', '.y')) > a b.x b.x b.y > 1 a -1.52250626 q 101 > 2 b -0.99865341 r 102 > ... ## Let's call this "exhibit A" > > But if I do this: > R> merge(d1, d2, by='a', suffixes=c("", ".y")) > > I totally expect: > > a b b.x b.y > 1 a -1.52250626 q 101 > 2 b -0.99865341 r 102 > ## Let's call this "exhibit B" ... > > and not (using R-2.15.0 beta) (exhibit B): > > Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) : > there is already a column named 'b' > As a user I would expect that the rule for column names produced by "merge" would be simple: the output column name is the concatenation of the input column name and the corresponding suffix. When I use 'merge" I don't expect a more complicated behavior that somehow still uses '.x' even though I asked it not to, as in your second example. So I would say that the new behavior is more consistent. When I write functions that use "merge" on general data frames, I can anticipate and use the simpler rule, but it is difficult to anticipate the results of the more complicated rule in a way that my subsequent lines of code will work. If the inputs I give to merge are inconsistent with the simple rule I would much rather have an exception (highlighting exactly where my code has gone wrong) than a surprising column name change (which makes my code mysteriously fail ten or a hundred lines later). Peter [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel