On Sun, Mar 18, 2012 at 12:48 PM, Steve Lianoglou <
mailinglist.honey...@gmail.com> wrote:

> Hi Uwe,
>
> 2012/3/17 Uwe Ligges <lig...@statistik.tu-dortmund.de>:
> >
> >
> > On 15.03.2012 22:48, Matthew Dowle wrote:
> >>
> >>
> >> Anyone?
> >>
> >>> Is it intended that the first suffix can no longer be blank? Seems to
> be
> >>> caused by a bug fix to merge in R 2.15.0.
> >
> >
> >
> > Right, the user is now protected against confusing himself by using names
> > that were not unique before the merge.
>
> ... now I'm confused :-)
>
> If the user explicitly asks for a NULL/0/empty/whatever suffix,
> they're not really going to be confusing themselves, right?
>

If the user asks for a blank suffix and you still give back ".x" or ".y"
 as a suffix, then yes that is confusing.


> I actually feel like I do this often, where "this" is explicitly
> asking to not add a suffix to one group of columns ... I do confuse
> myself every and now and again, but not in this context, yet.
>
> I can see that *this* confusing case is now handled w/ this change
> (which wasn't before):
>
> ## I'm using R-devel compiled back in November, 2011 (r57571)
> R> d1 <- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
> R> d2 <- data.frame(a=letters[1:10], b=101:110)
> R> merge(d1, d2, by='a', suffixes=c('.x', '.y'))
>   a         b.x b.x b.y
> 1  a -1.52250626   q 101
> 2  b -0.99865341   r 102
> ... ## Let's call this "exhibit A"
>
> But if I do this:
> R> merge(d1, d2, by='a', suffixes=c("", ".y"))
>
> I totally expect:
>
>   a           b b.x b.y
> 1  a -1.52250626   q 101
> 2  b -0.99865341   r 102
> ## Let's call this "exhibit B"

...
>
> and not (using R-2.15.0 beta) (exhibit B):
>
> Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) :
>  there is already a column named 'b'
>

As a user I would expect that the rule for column names produced by "merge"
would be simple: the output column name is the concatenation of the input
column name and the corresponding suffix. When I use 'merge" I don't expect
a more complicated behavior that somehow still uses '.x' even though I
asked it not to, as in your second example. So I would say that the new
behavior is more consistent.

When I write functions that use "merge" on general data frames, I can
anticipate and use the simpler rule, but it is difficult to anticipate the
results of the more complicated rule in a way that my subsequent lines of
code will work.

If the inputs I give to merge are inconsistent with the simple rule I would
much rather have an exception (highlighting exactly where my code has gone
wrong) than a surprising column name change (which makes my code
mysteriously fail ten or a hundred lines later).

Peter

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to