Dear R-devel,

Can anyone help me to understand this? It seems that subscripting the rows of a data.frame without actually changing their order, somehow changes an internal representation of row.names that is revealed by e.g. dput/dump/serialize

I have read the docs and inspected the (R) code for data.frame, rownames, row.names and dput without enlightenment.

df=data.frame(a=1:10, b=1)
dput(df)
df2=df[1:nrow(df), ]
# R thinks they are equal (so do I!)
all.equal(df, df2)
dput(df2)

Looking at the output of the dputs

dput(df)
structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("a",
"b"), row.names = c(NA, -10L), class = "data.frame")
dput(df2)
structure(list(a = 1:10, b = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("a",
"b"), row.names = c(NA, 10L), class = "data.frame")

we have row.names = c(NA, -10L) in the first case and row.names = c(NA, 10L) in the second, so somehow these objects have a different representation

Can anyone explain why? This has come up because

library(digest)
digest(df)==digest(df2)
[1] FALSE

digest uses serialize under the hood, but serialize, dput and dump all show the same effect (I've pasted an example below using dump, md5sum from base R).

Many thanks for any enlightenment! More generally is there any way to calculate a digest of a data.frame that could get round this issue or is that not possible?

Best wishes,

Greg.


A digest using base R:

library(tools)
td=tempfile()
dir.create(td)
tempfiles=file.path(td,c("df", "df2"))
dump("df",tempfiles[1])
dump("df2",tempfiles[2])
md5sum(tempfiles)

# different md5sum

sessionInfo() # for my laptop but also observed on R 3.1.2
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] tools stats graphics grDevices utils datasets methods base

other attached packages:
[1] nat_1.5.14 nat.utils_0.4.2 digest_0.6.4 Rvcg_0.9 devtools_1.6.1 igraph_0.7.1
[7] testthat_0.9.1  rgl_0.93.1098

loaded via a namespace (and not attached):
[1] codetools_0.2-9 filehash_2.2-2 nabor_0.4.3 parallel_3.1.1 plyr_1.8.1 [6] Rcpp_0.11.3 rstudio_0.98.1062 rstudioapi_0.1 XML_3.98-1.1 yaml_2.1.13

--
Gregory Jefferis, PhD
Division of Neurobiology
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge, CB2 OQH, UK

http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis
http://jefferislab.org
http://flybrain.stanford.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to