Please do not cross post. You have already rased this on bugzilla. I
will follow up there later today.
luke
On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote:
Hi all,
Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP
wrapper which internally still contains the names/dimnames, and calling
base::serialize on the result writes them out. They are unserialized in the same
way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not
obvious except in wasted time, bandwidth, or disk space.
Example:
v1 <- setNames(rnorm(64), paste("element name", 1:64))
v2 <- unname(v1)
names(v2)
# NULL
length(serialize(v1, NULL))
# [1] 2039
length(serialize(v2, NULL))
# [1] 2132
length(serialize(v2[TRUE], NULL))
# [1] 543
con <- rawConnection(raw(), "w")
serialize(v2, con)
v3 <- unserialize(rawConnectionValue(con))
names(v3)
# NULL
length(serialize(v3, NULL))
# 2132
# Similarly for matrices:
m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col
name", 1:8)))
m2 <- unname(m1)
dimnames(m2)
# NULL
length(serialize(m1, NULL))
# [1] 918
length(serialize(m2, NULL))
# [1] 1035
length(serialize(m2[TRUE, TRUE], NULL))
# 582
Previously discussed here, too:
https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html
This happens with other attributes as well, but less predictably:
x1 <- structure(rnorm(100), data=rnorm(1000000))
x2 <- structure(x1, data=NULL)
length(serialize(x1, NULL))
# [1] 8000952
length(serialize(x2, NULL))
# [1] 924
x1b <- rnorm(100)
attr(x1b, "data") <- rnorm(1000000)
x2b <- x1b
attr(x2b, "data") <- NULL
length(serialize(x1b, NULL))
# [1] 8000863
length(serialize(x2b, NULL))
# [1] 8000956
This is pretty severe, trying to track down why serializing a small object
kills the network, because of which large attributes it may have once had
during its lifetime around the codebase that are still secretly tagging along.
Is there a plan to resolve this? Any suggestions for maybe a C++ workaround
until then? Or an alternative performant serialization solution?
Best,
--
Zafer
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: [email protected]
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel