I would like to add a vote for keeping blank suffixes in merge(), as I
routinely use this functionality. An example use case:
# using R 2.14.1
# d1 is some data that I've been working on for a while
d1 <- data.frame(a=letters[1:10], b=1:10)
# d2 is some new data from a collaborator. I want to add one of these #
columns to d1, and also check that the existing columns are consistent
d2 <- data.frame(a=letters[1:10], b=1:10, c=101:110)
# use blank suffix to avoid changing the column names of my
# original data frame
d3 <- merge(d1, d2, by="a", suffixes=c("", ".new"))
all(d3$b == d3$b.new)
# if this is FALSE, time to email collaborator
d3$b.new <- NULL
In real usage d1 would have many more columns than d2, so adding
suffixes to d1 would be tedious to undo after the merge.
Stephanie Gogarten
Research Scientist, Biostatistics
University of Washington
On 3/19/12 4:00 AM, r-devel-requ...@r-project.org wrote:
Message: 12
Date: Sun, 18 Mar 2012 15:48:30 -0400
From: Steve Lianoglou<mailinglist.honey...@gmail.com>
To: Uwe Ligges<lig...@statistik.tu-dortmund.de>
Cc: Matthew Dowle<mdo...@mdowle.plus.com>,r-devel@r-project.org
Subject: Re: [Rd] merge bug fix in R 2.15.0
Message-ID:
<CAHA9McMGy0U9B_8x=rsbfjccumsuehuuxb03wdtftrbgafs...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi Uwe,
2012/3/17 Uwe Ligges<lig...@statistik.tu-dortmund.de>:
>
>
> On 15.03.2012 22:48, Matthew Dowle wrote:
>>
>>
>> Anyone?
>>
>>> Is it intended that the first suffix can no longer be blank? Seems to be
>>> caused by a bug fix to merge in R 2.15.0.
>
>
>
> Right, the user is now protected against confusing himself by using names
> that were not unique before the merge.
... now I'm confused:-)
If the user explicitly asks for a NULL/0/empty/whatever suffix,
they're not really going to be confusing themselves, right?
I actually feel like I do this often, where "this" is explicitly
asking to not add a suffix to one group of columns ... I do confuse
myself every and now and again, but not in this context, yet.
I can see that*this* confusing case is now handled w/ this change
(which wasn't before):
## I'm using R-devel compiled back in November, 2011 (r57571)
R> d1<- data.frame(a=letters[1:10], b=rnorm(10), b.x=tail(letters, 10))
R> d2<- data.frame(a=letters[1:10], b=101:110)
R> merge(d1, d2, by='a', suffixes=c('.x', '.y'))
a b.x b.x b.y
1 a -1.52250626 q 101
2 b -0.99865341 r 102
... ## Let's call this "exhibit A"
But if I do this:
R> merge(d1, d2, by='a', suffixes=c("", ".y"))
I totally expect:
a b b.x b.y
1 a -1.52250626 q 101
2 b -0.99865341 r 102
## Let's call this "exhibit B"
...
and not (using R-2.15.0 beta) (exhibit B):
Error in merge.data.frame(d1, d2, by = "a", suffixes = c("", ".y")) :
there is already a column named 'b'
I can take a crack at a patch to keep the "rescue user from surprises"
example outlined in "exhibit A," but also letting user accomplish
"exhibit B" if there is a consensus of agreement on this particular
world view.
-steve
-- Steve Lianoglou Graduate Student: Computational Systems Biology ?|
Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of
Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel