Re: [Rd] [External] Clearing attributes returns ALTREP, serialize still saves them

2021-07-03 Thread luke-tierney

Please do not cross post. You have already rased this on bugzilla. I
will follow up there later today.

luke

On Sat, 3 Jul 2021, Zafer Barutcuoglu wrote:


Hi all,

Setting names/dimnames on vectors/matrices of length>=64 returns an ALTREP 
wrapper which internally still contains the names/dimnames, and calling 
base::serialize on the result writes them out. They are unserialized in the same 
way, with the names/dimnames hidden in the ALTREP wrapper, so the problem is not 
obvious except in wasted time, bandwidth, or disk space.

Example:
  v1 <- setNames(rnorm(64), paste("element name", 1:64))
  v2 <- unname(v1)
  names(v2)
  # NULL
  length(serialize(v1, NULL))
  # [1] 2039
  length(serialize(v2, NULL))
  # [1] 2132
  length(serialize(v2[TRUE], NULL))
  # [1] 543

  con <- rawConnection(raw(), "w")
  serialize(v2, con)
  v3 <- unserialize(rawConnectionValue(con))
  names(v3)
  # NULL
  length(serialize(v3, NULL))
  # 2132

  # Similarly for matrices:
  m1 <- matrix(rnorm(64), 8, 8, dimnames=list(paste("row name", 1:8), paste("col 
name", 1:8)))
  m2 <- unname(m1)
  dimnames(m2)
  # NULL
  length(serialize(m1, NULL))
  # [1] 918
  length(serialize(m2, NULL))
  # [1] 1035
  length(serialize(m2[TRUE, TRUE], NULL))
  # 582

Previously discussed here, too:
https://r.789695.n4.nabble.com/Invisible-names-problem-td4764688.html

This happens with other attributes as well, but less predictably:
  x1 <- structure(rnorm(100), data=rnorm(100))
  x2 <- structure(x1, data=NULL)
  length(serialize(x1, NULL))
  # [1] 8000952
  length(serialize(x2, NULL))
  # [1] 924

  x1b <- rnorm(100)
  attr(x1b, "data") <- rnorm(100)
  x2b <- x1b
  attr(x2b, "data") <- NULL
  length(serialize(x1b, NULL))
  # [1] 8000863
  length(serialize(x2b, NULL))
  # [1] 8000956

This is pretty severe, trying to track down why serializing a small object 
kills the network, because of which large attributes it may have once had 
during its lifetime around the codebase that are still secretly tagging along.

Is there a plan to resolve this? Any suggestions for maybe a C++ workaround 
until then? Or an alternative performant serialization solution?

Best,
--
Zafer


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] SET_COMPLEX_ELT and SET_RAW_ELT missing from Rinternals.h

2021-07-03 Thread Konrad Siek
I think all of my questions are answered. Thank you for your attention and
assistance.

The first question is whether you need to do this. Or, more to the
> point, whether it is safe to do this. In R objects should behave as if
> they are not mutable. Mutation in C code may be OK if the objects are
> not reachable from any R variables, but that almost always means they
> are private to your code so yo can use what you know about internal
> structure.
>
>
Thank you for the warning. I believe it's a legitimate use. In extremely
rough summary: A C function is called via .Call. Inside it, I create the
vector and use SET_*_ELT to populate it. I then return the vector. There is
some amount of complexity between creation and population, and the vector
can potentially be ALTREP.

I don't use SET_*_ELT on any vector I have not created.

COMPLEX0 is not in the API; it will probably be removed from the
> installed header files as we clean these up.
>

My mistake, thank you.

ALTCOMPLEX_SET_ELT is an internal implementation feature and not in the API.
> Again, it will probably be removed from the installed headers.
>
>
Thanks for the warning. I'll make sure to avoid it.

k

On Sat, Jul 3, 2021 at 2:37 AM  wrote:

> On Thu, 1 Jul 2021, Konrad Siek wrote:
>
> > Thanks!
> >
> > So what would be the prescribed way of assigning elements to a CPLXSXP
> if I
> > needed to?
>
> The first question is whether you need to do this. Or, more to the
> point, whether it is safe to do this. In R objects should behave as if
> they are not mutable. Mutation in C code may be OK if the objects are
> not reachable from any R variables, but that almost always means they
> are private to your code so yo can use what you know about internal
> structure.
>

> If it is legitimate to mutate you can use SET_COMPLEX_ELT. I've added
> the declaration to Rinternals in R-devel and R-patched.
>
> For SET_COMPLEX_ELT(x, in v) is equivalent to COMPLEX(sexp)[index] = value,
> but that could change in the future it Set methods are supported.
>
> This does materialize a potentially compact object, but again the most
> important question is whether mutation is legitimate at all.
>
> > One way I see is to do what most of the code inside the interpreter does
> and
> > grab the vector's data pointer:
> >
> > COMPLEX(sexp)[index] = value;
> > COMPLEX0(sexp)[index] = value;
> >
>
> COMPLEX0 is not in the API; it will probably be removed from the
> installed header files as we clean these up.
>

> > This will materialize an ALTREP CPLXSXP though, so maybe the best way
> would
> > be to mirror what SET_COMPLEX_ELT does in Rinlinedfuns.h?
> >
> > if (ALTREP(sexp)) ALTCOMPLEX_SET_ELT(sexp, index, value); else
> > COMPLEX0(sexp)[index] = vector;
>
> ALTCOMPLEX_SET_ELT is an internal implementation feature and not in the
> API.
> Again, it will probably be removed from the installed headers.
>
> Best,
>
> luke
>
> > This seems better, but it's not used in the interpreter anywhere as far
> as I
> > can tell, presumably because of the setter interface not being complete,
> as
> > you point out. But should I be avoiding this second approach for some
> > reaosn?
> >
> > k
> >
> > On Tue, Jun 29, 2021 at 4:06 AM  wrote:
> >   The setter interface for atomic types is not yer implemented. It
> >   may
> >   be some day.
> >
> >   Best,
> >
> >   luke
> >
> >   On Fri, 25 Jun 2021, Konrad Siek wrote:
> >
> >   > Hello,
> >   >
> >   > I am working on a package that works with various types of R
> >   vectors,
> >   > implemented in C. My code has a lot of SET_*_ELT operations in
> >   it for
> >   > various types of vectors, including for CPLXSXPs and RAWSXPs.
> >   >
> >   > I noticed SET_COMPLEX_ELT and SET_RAW_ELT are defined in
> >   Rinlinedfuns.h but
> >   > not declared in Rinternals.h, so they cannot be used in
> >   packages. I was
> >   > going to re-implement them or extern them in my package,
> >   however,
> >   > interestingly, ALTCOMPLEX_SET_ELT and ALTRAW_SET_ELT  are both
> >   declared in
> >   > Rinternals.h, making me think SET_COMPLEX_ELT and SET_RAW_ELT
> >   could be
> >   > purposefully obscured. Otherwise it may just be an oversight
> >   and I should
> >   > bring it to someone's attention anyway.
> >   >
> >   > I have three questions that I hope R-devel could help me with.
> >   >
> >   > 1. Is this an oversight, or are SET_COMPLEX_ELT and
> >   SET_RAW_ELT not exposed
> >   > on purpose? 2. If they are not exposed on purpose, I was
> >   wondering why.
> >   > 3. More importantly, what would be good ways to set elements
> >   of these
> >   > vectors while playing nice with ALTREP and avoiding whatever
> >   pitfalls
> >   > caused these functions to be obscured in the first place?
> >   >
> >   > Best regards,
> >   > Konrad,
> >   >
> >   >   [[alternative