But I get:

#omitted initial line which would have create an object only to be overwritten.

> Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"), Price = c("Low"))
> Test2_DF<-rbind(Test1_DF, Test1_DF)
> setdiff(Test1_DF, Test2_DF)
    HouseSize LandLocation Price
1           1         Here   Low
2           2         Here   Low
3           3         Here   Low
4           4         Here   Low
5           5         Here   Low
.... snipped additional 95 rows.

Furthermore I did not load any library (nor did your indicate what packages you have loaded), and there does not seem to be a setdiff.data.frame in my workspace:
> setdiff.data.frame
Error: object "setdiff.data.frame" not found

> sessionInfo()
R version 2.8.1 Patched (2009-01-19 r47650)
i386-apple-darwin9.6.0

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 splines stats graphics grDevices utils datasets methods base

other attached packages:
[1] MASS_7.2-46 reshape_0.8.2 plyr_0.1.5 modeltools_0.2-16 mvtnorm_0.9-4
[6] survival_2.35-4

loaded via a namespace (and not attached):
[1] coin_1.0-1


On May 29, 2009, at 5:58 PM, Jason Rupert wrote:


Jay,


Thanks much for the reply. I think you are right about the prob. Unfortunately, I was not able to find the old emails I had discussing the use of the more powerful setdiff that essentially inherits from the base class R setdiff functionality but extends that functionality by now working with data.frames instead of just a simple array of values. Love this functionality.

However, for the following example,
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"))
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"), Price = c("Low"))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
[1] HouseSize    LandLocation Price
<0 rows> (or 0-length row.names)
setdiff(Test2_DF, Test1_DF)
[1] HouseSize    LandLocation Price
<0 rows> (or 0-length row.names)

I was hoping for this example one of the setdiff's would have returned essentially Test1_DF, since it is duplicated and that is what is different between the two dataframes.

So, I guess I am trying to figure out a way to truely diff the dataframes, i.e. determine when two data.frames are different from one another and then receive the output of the results.

Does this capability exist in a function within a current R package or does it exist within a typically used pattern to create this functionality?

Thanks again for any feedback you can provide.


Also, I tried to determine my Session Info and the packages I have loaded, but I received the following:
sessionInfo()
Error in x$Priority : $ operator is invalid for atomic vectors
In addition: There were 12 warnings (use warnings() to see them)
warnings()
Warning messages:
1: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'prob' is missing or broken
2: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'ggplot2' is missing or broken
3: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'reshape' is missing or broken
4: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'RColorBrewer' is missing or broken
5: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'proto' is missing or broken
6: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'plyr' is missing or broken
7: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'nortest' is missing or broken
8: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'fBasics' is missing or broken
9: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'timeSeries' is missing or broken
10: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'timeDate' is missing or broken
11: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'vcd' is missing or broken
12: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
 DESCRIPTION file of package 'colorspace' is missing or broken


However, I typically load the following ones:
library(colorspace, lib.loc=RLibraryPathLocation)
library(vcd, lib.loc=RLibraryPathLocation)
library(timeDate, lib.loc=RLibraryPathLocation)
library(timeSeries, lib.loc=RLibraryPathLocation)
library(fBasics, lib.loc=RLibraryPathLocation)
library(nortest, lib.loc=RLibraryPathLocation)
library(plyr, lib.loc=RLibraryPathLocation)
library(proto, lib.loc=RLibraryPathLocation)
library(RColorBrewer, lib.loc=RLibraryPathLocation)
library(reshape, lib.loc=RLibraryPathLocation)
library(ggplot2, lib.loc=RLibraryPathLocation)
library(prob, lib.loc=RLibraryPathLocation)


--- On Fri, 5/29/09, G. Jay Kerns <gke...@ysu.edu> wrote:

From: G. Jay Kerns <gke...@ysu.edu>
Subject: Re: [R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
To: "Jason Rupert" <jasonkrup...@yahoo.com>
Cc: R-help@r-project.org
Date: Friday, May 29, 2009, 3:21 PM
Dear Jason,

On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrup...@yahoo.com>
wrote:

I think I am using the improved version of
setdiff(...) that handles data.frames, so I think some odd
behavior was expected but this one is escaping me.

It appears that the the addition of duplicate entries
is not caught by the setdiff(...).  Is this expected
behavior?

[snip]

Thanks in advance for any feedback.

Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)

However,
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
 [1]  26  27  28  29  30  31  32  33  34
 35  36  37  38  39  40  41
[17]  42  43  44  45  46  47  48  49  50  51
 52  53  54  55  56  57
[33]  58  59  60  61  62  63  64  65  66  67
 68  69  70  71  72  73
[49]  74  75  76  77  78  79  80  81  82  83
 84  85  86  87  88  89
[65]  90  91  92  93  94  95  96  97  98  99
100

setdiff(Test3_DF, Test1_DF)
integer(0)


You didn't explicitly say which "improved version" of
setdiff() that
you are using, so I can only presume that you are using
the
setdiff.data.frame in the prob package.

The behaviour you are observing is expected and matches
the
base:::setdiff behaviour in the case of vectors;  cf.

x1 <- c(1:100)
x2 <- c(x1,x1)

setdiff(x1, x2)  # integer(0)
setdiff(x2, x1)  # integer(0)

x3 <- c(1:25)
setdiff(x1, x3)  # 26:100
setdiff(x3, x1)  # integer(0)



If so, is there another method or approach that should
be used to identify duplicate row entries between two
different data frames?


The R-help archives are chock full of every possible
variant of
questions (and answers) about this, and you haven't said
_exactly_
what you are looking for. In the absence of an already
posted
solution, please specify exactly what you want and I'll
wager an R
Ninja could dispatch it in moments.

Regards,
Jay









***************************************************
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/





______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to