On 01/09/2014 07:53 PM, Norm Matloff wrote:

Thanks, Hadley and Simon.

The reason I asked today was that when reference classes first came out,
it had appeared to me that there is no peformance advantage to using
reference classes, that it was mainly a style issue (encapsulation,
etc.).  Unless I'm missing something, both of you have confirmed my
original impression, correct?

We've used reference classes for performance benefit. E.g., updating a single (e.g., small) field in an S4 object triggers an entire copy of the object, whereas for a reference class the fields can be updated independently. This is especially true inside function (e.g., method) calls (e.g., slot access), where the object is marked to be duplicated.


> a = setClass("A", representation(x="numeric"))(x=1:5)
.Internal(inspect(a))
@5237508 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
  @5237460 02 LISTSXP g0c0 []
    TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
    @5225db8 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
    TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
    @52355c8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
      @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
    ATTRIB:
      @52373f0 02 LISTSXP g0c0 []
        TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
        @5235598 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
          @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
a@x[1]=2L
.Internal(inspect(a))  ## almost everything duplicated!
@5243cd0 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
  @5243c60 02 LISTSXP g0c0 []
    TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
    @5225b30 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
    TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
    @52405f8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
      @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
    ATTRIB:
      @5243bf0 02 LISTSXP g0c0 []
        TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
        @52405c8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
          @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"

(this also influence performance of other R objects, of course, e.g.,

> f = function(x) { x@a = 2L; x }
> l = list(a=1:5); .Internal(inspect(l))
@53f8448 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @53cef48 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
  @53f9190 02 LISTSXP g0c0 []
    TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
    @53f8418 16 STRSXP g0c1 [] (len=1, tl=0)
      @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> .Internal(inspect(f(l)))
@53f83e8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @53cef00 13 INTSXP g0c3 [] (len=5, tl=0) 2,2,3,4,5
ATTRIB:
  @53f9988 02 LISTSXP g0c0 []
    TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
    @53f83b8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
      @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"

Copies are localized to the updated field with reference classes (can't show this with .Internal(inspect()), though, because x = new.env(); x$x = x; .Internal(insepct(x)) [mimicking .self in reference classes] has an infinite (? I didn't wait that long) recursion).

I think actually reference classes have a surprising performance _hit_ compared to other R approaches to minimizing copying; this has come up on this or the R mailing list before, but I've lost track of the original. Here's a StackOverflow version

http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440

Martin


Norm

On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:
On Jan 9, 2014, at 6:20 PM, Norm Matloff <matl...@cs.ucdavis.edu> wrote:

Bottom line:  Really no different from the case of ordinary vectors that are 
not in reference classes, right?  In other words, not true pass-by-reference.


The pass-by-reference applies to the object itself, not necessarily to anything 
you obtain by calling a function on the object (like extracting a part from 
it). Vectors are not reference-semantics objects so regular rules apply.

If you pass a reference semantics object to a function, the function can modify 
the object. If you pass any other object, the contents are guaranteed to not be 
touched. Reference-semantics objects in R are literally passed by reference 
(same C pointer), so yes, it is true pass-by-reference.

Cheers,
Simon


(*) - technically, there is a thin non-refernce wrapper around the instances of 
reference classes, because there are things you don't want to happen to your 
ref-semantics instance - e.g. you don't want unclass(x) to destroy x and all 
instances of it (which it would do if there was no wrapper). But the actual 
payload of the object is a true ref-semantics object - an environment - that is 
always passed by reference.



Norm

On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
It's a bit of a simplification, reference classes are wrappers around
environments.  So if modifying a value in an environment would create
a copy, then modifying the same value in a reference class will also
create a copy.

The situation with modifying a vector is a bit complicated as it will
sometimes be modified in place and sometimes be duplicated and
modified (depending on whether its NAMED attribute is 1 or 2, and
exactly how you're modifying it).

Hadley

On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff <matl...@cs.ucdavis.edu> wrote:
I have a question about reference classes, which someone here
undoubtedly can answer immediately, saving me hours of wading through
indecipherable internal code. :-)  Thanks in advance.

Reference class data is mutable, fine, but in what sense?  Is it really
physical,  or is it just a view given to the programmer?

If for instance I have vector as a field in a reference class, and I
change one element of the vector, is it really true that the change is
guaranteed to be made in-place, no copying, no memory reallocation etc?

Norm

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://had.co.nz/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to