Ok, a bit more: The relevant bit in serialize.c that I can see is:
if (ALTREP(s) && stream->version >= 3) { SEXP info = ALTREP_SERIALIZED_CLASS(s); SEXP state = ALTREP_SERIALIZED_STATE(s); if (info != NULL && state != NULL) { int flags = PackFlags(ALTREP_SXP, LEVELS(s), OBJECT(s), 0, 0); PROTECT(state); PROTECT(info); OutInteger(stream, flags); WriteItem(info, ref_table, stream); * WriteItem(state, ref_table, stream);* WriteItem(ATTRIB(s), ref_table, stream); UNPROTECT(2); /* state, info */ return; } /* else fall through to standard processing */ } And in the wrapper altclass, we have: *static SEXP wrapper_Serialized_state(SEXP x)* { return CONS(*WRAPPER_WRAPPED(x)*, WRAPPER_METADATA(x)); } So whats happening, is that the data isn't being written out during the WriteItem(ATTRIB(s)), that actually has the correct attribute value. Its being written out in the bolden line above that, the state, which has the wrapped SEXP, which ITSELF has the attributes on it, but is not an ALTREP, so that goes through standard processing, which writes out the attributes as normal. So that, I believe, is what needs to change. One possibility is that wrapper_Serialized_state can be made smarter so that the inner attributes are duplicated and then wiped clean for any that are overridden by the attributes on the wrapper. Another option is that the ALTREP WriteItem section could be made smarter, but that seems less robust. Finally, the wrapper might be able to be modified in such a way that setting the attribute on the wrapper clears taht attribute on the wrapped value, if present. . I think making wrapper_Serialized_state smarter is the right way to attack this, and thats the first thing I'll try when I get to it, but if someone tackles it before me hopefully this digging helped some. Best, ~G On Fri, Jul 2, 2021 at 10:18 PM Gabriel Becker <gabembec...@gmail.com> wrote: > Hi all, > > I don't have a solution yet, but a bit more here: > > > .Internal(inspect(x2b)) > > @7f913826d590 14 REALSXP g0c0 [REF(1)] wrapper [srt=-2147483648,no_na=0] > > @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0) > 0.45384,0.926371,0.838637,-1.71485,-0.719073,... > > ATTRIB: > > @7f913826dc20 02 LISTSXP g0c0 [REF(1)] > > TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(460)] "data" > > @7f9118310000 14 REALSXP g0c7 [REF(2)] (len=1000000, tl=0) > 0.66682,0.480576,-1.13229,0.453313,-0.819498,... > > > attr(x2b, "data") <- "small" > > > .Internal(inspect(x2b)) > > @7f913826d590 14 REALSXP g0c0 [REF(1),ATT] wrapper > [srt=-2147483648,no_na=0] > > @7f9137500320 14 REALSXP g0c7 [REF(2),ATT] (len=100, tl=0) > 0.45384,0.926371,0.838637,-1.71485,-0.719073,... > > ATTRIB: > > @7f913826dc20 02 LISTSXP g0c0 [REF(1)] > > TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data" > > @7f9118310000 14 REALSXP g0c7 [REF(2)] (len=1000000, tl=0) > 0.66682,0.480576,-1.13229,0.453313,-0.819498,... > > ATTRIB: > > @7f913826c870 02 LISTSXP g0c0 [REF(1)] > > TAG: @7f91378538d0 01 SYMSXP g0c0 [MARK,REF(461)] "data" > > @7f9120580850 16 STRSXP g0c1 [REF(3)] (len=1, tl=0) > > @7f91205808c0 09 CHARSXP g0c1 [REF(3),gp=0x60] [ASCII] [cached] > "small" > > > So we can see that the assignment of attr(x2b, "data") IS doing something, > but it isn't doing the right thing. The fact that the above code assigned > null instead of a value was hiding this. > > > I will dig into this more if someone doesn't get it fixed before me, but > it won't be until after useR, because I'm preparing multiple talks for that > and it is this coming week. > > > Best, > > ~G > > On Fri, Jul 2, 2021 at 9:15 PM Zafer Barutcuoglu < > zafer.barutcuo...@gmail.com> wrote: > >> Hi all, >> >> Setting names/dimnames on vectors/matrices of length>=64 returns an >> ALTREP wrapper which internally still contains the names/dimnames, and >> calling base::serialize on the result writes them out. They are >> unserialized in the same way, with the names/dimnames hidden in the ALTREP >> wrapper, so the problem is not obvious except in wasted time, bandwidth, or >> disk space. >> >> Example: >> v1 <- setNames(rnorm(64), paste("element name", 1:64)) >> v2 <- unname(v1) >> names(v2) >> # NULL >> length(serialize(v1, NULL)) >> # [1] 2039 >> length(serialize(v2, NULL)) >> # [1] 2132 >> length(serialize(v2[TRUE], NULL)) >> # [1] 543 >> >> con <- rawConnection(raw(), "w") >> serialize(v2, con) >> v3 <- unserialize(rawConnectionValue(con)) >> names(v3) >> # NULL >> length(serialize(v3, NULL)) >> # 2132 >> >> # Similarly for matrices: >> m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), >> paste("col name", 1:8))) >> m2 <- unname(m1) >> dimnames(m2) >> # NULL >> length(serialize(m1, NULL)) >> # [1] 918 >> length(serialize(m2, NULL)) >> # [1] 1035 >> length(serialize(m2[TRUE, TRUE], NULL)) >> # 582 >> >> Previously discussed here, too: >> https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html >> >> This happens with other attributes as well, but less predictably: >> x1 <- structure(rnorm(100), data=rnorm(1000000)) >> x2 <- structure(x1, data=NULL) >> length(serialize(x1, NULL)) >> # [1] 8000952 >> length(serialize(x2, NULL)) >> # [1] 924 >> >> x1b <- rnorm(100) >> attr(x1b, "data") <- rnorm(1000000) >> x2b <- x1b >> attr(x2b, "data") <- NULL >> length(serialize(x1b, NULL)) >> # [1] 8000863 >> length(serialize(x2b, NULL)) >> # [1] 8000956 >> >> This is pretty severe, trying to track down why serializing a small >> object kills the network, because of which large attributes it may have >> once had during its lifetime around the codebase that are still secretly >> tagging along. >> >> Is there a plan to resolve this? Any suggestions for maybe a C++ >> workaround until then? Or an alternative performant serialization solution? >> >> Best, >> -- >> Zafer >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel