Re: [Rd] R's copying of arguments (Re: Julia)

Simon Urbanek Wed, 21 Mar 2012 18:45:50 -0700

On Mar 21, 2012, at 9:31 PM, Hervé Pagès wrote:

> On 03/21/2012 06:23 PM, Simon Urbanek wrote:
>> 
>> On Mar 20, 2012, at 3:08 PM, Hervé Pagès wrote:
>> 
>>> Hi Oliver,
>>> 
>>> On 03/17/2012 08:35 AM, oliver wrote:
>>>> Hello,
>>>> 
>>>> regarding the copying issue,
>>>> I would like to point to the
>>>> 
>>>> "Writing R-Extensions" documentation.
>>>> 
>>>> There it is mentio9ned, that functions of extensions
>>>> that use the .C interface normally do get their arguments
>>>> pre-copied...
>>>> 
>>>> 
>>>> In section 5.2:
>>>> 
>>>>   "There can be up to 65 further arguments giving R objects to be
>>>>   passed to compiled code. Normally these are copied before being
>>>>   passed in, and copied again to an R list object when the compiled
>>>>   code returns."
>>>> 
>>>> But for the .Call and .Extension interfaces this is NOT the case.
>>>> 
>>>> 
>>>> 
>>>> In section 5.9:
>>>>   "The .Call and .External interfaces allow much more control, but
>>>>   they also impose much greater responsibilities so need to be used
>>>>   with care. Neither .Call nor .External copy their arguments. You
>>>>   should treat arguments you receive through these interfaces as
>>>>   read-only."
>>>> 
>>>> 
>>>> Why is read-only preferred?
>>>> 
>>>> Please, see the discussion in section 5.9.10.
>>>> 
>>>> It's mentioned there, that a copy of an object in the R-language
>>>> not necessarily doies a real copy of that object, but instead of
>>>> this, just a "rerference" to the real data is created (two names
>>>> referring to one bulk of data). That's typical functional
>>>> programming: not a variable, but a name (and possibly more than one
>>>> name) bound to an object.
>>>> 
>>>> 
>>>> Of course, if yo change the orgiginal named value, when there
>>>> would be no copy of it, before changing it, then both names
>>>> would refer to the changed data.
>>>> of course that is not, what is wanted.
>>>> 
>>>> But what you also can see in section 5.9.10 is, that
>>>> there already is a mechanism (reference counting) that allows
>>>> to distinguish between unnamed and named object.
>>>> 
>>>> So, this is directly adressing the points you have mentioned in your
>>>> examples.
>>>> 
>>>> So, at least in principial, R allows to do in-place modifications
>>>> of object with the .Call interface.
>>>> 
>>>> You seem to refer to the .C interface, and I had explored the .Call
>>>> interface. That's the reason why you may insist on "it's copyied
>>>> always" and I wondered, what you were talking about, because the
>>>> .Call interface allowed me rather C-like raw style of programming
>>>> (and the user of it to decide, if copying will be done or not).
>>>> 
>>>> The mechanism to descide, if copying should be done or not,
>>>> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros.
>>>> with NAMED you can get the number of references.
>>>> 
>>>> But later in that section it is mentioned, that - at least for now -
>>>> NAMED always returns the value 2.
>>>> 
>>>> 
>>>>   "Currently all arguments to a .Call call will have NAMED set to 2,
>>>>   and so users must assume that they need to be duplicated before
>>>>   alteration."
>>>>                (section 5.9.10, last sentence)
>>>> 
>>>> 
>>>> So, the in-place modification can be done already with the .Call
>>>> intefcae for example. But the decision if it is safe or not
>>>> is not supported at the moment.
>>>> 
>>>> So the situation is somewhere between: "it is possible" and
>>>> "R does not support a safe decision if, what is possible, also
>>>> can be recommended".
>>>> At the moment R rather deprecates in-place modification by default
>>>> (the save way, and I agree with this default).
>>>> 
>>>> But it's not true, that R in general copies arguments.
>>>> 
>>>> But this seems to be true for the .C interface.
>>>> 
>>>> Maybe a lot of performance-/memory-problems can be solved
>>>> by rewriting already existing packages, by providing them
>>>> via .Call instead of .C.
>>> 
>>> My understanding is that most packages use the .C interface
>>> because it's simpler to deal with and because they don't need
>>> to pass complicated objects at the C level, just atomic vectors.
>>> My guess is that it's probably rarely the case that the cost
>>> of copying the arguments passed to .C is significant, but,
>>> if that was the case, then they could always call .C() with
>>> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning
>>> section in the man page).
>>> 
>>> No need to switch to .Call
>>> 
>> 
>> I strongly disagree. I'm appalled to see that sentence here.
> 
> Come on!
> 
>> The overhead is significant for any large vector and it is in particular 
>> unnecessary since in .C you have to allocate *and copy* space even for 
>> results (twice!). Also it is very error-prone, because you have no 
>> information about the length of vectors so it's easy to run out of bounds 
>> and there is no way to check. IMHO .C should not be used for any code 
>> written in this century (the only exception may be if you are passing no 
>> data, e.g. if all you do is to pass a flag and expect no result, you can get 
>> away with it even if it is more dangerous). It is a legacy interface that 
>> dates way back and is essentially just re-named .Fortran interface. Again, I 
>> would strongly recommend the use of .Call in any recent code because it is 
>> safer and more efficient (if you don't care about either attribute, well, 
>> feel free ;)).
> 
> So aleph will not support the .C interface? ;-)
>


It will look at the timestamp of the source file and delete the package if it 
is not before 1980 ;). Otherwise it will send a request for punch cards with 
".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll 
be flaming about using the native Aleph interface and not the R compatibility 
layer ;)

Cheers,
S



> H.
> 
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>> 
>> 
>> 
>>> Cheers,
>>> H.
>>> 
>>>> 
>>>> 
>>>> Ciao,
>>>>    Oliver
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote:
>>>>> S (and its derivatives and successors) promises that functions
>>>>> will not change their arguments, so in an expression like
>>>>>    val<- func(arg)
>>>>> you know that arg will not be changed.  You can
>>>>> do that by having func copy arg before doing anything,
>>>>> but that uses space and time that you want to conserve.
>>>>> If arg is not a named item in any environment then it
>>>>> should be fine to write over the original because there
>>>>> is no way the caller can detect that shortcut.  E.g., in
>>>>>     cx<- cos(runif(n))
>>>>> the cos function does not need to allocate new space for
>>>>> its output, it can just write over its input because, without
>>>>> a name attached to it, the caller has no way of looking
>>>>> at what runif(n) returned.  If you did
>>>>>     x<- runif(n)
>>>>>     cx<- cos(x)
>>>>> then cos would have to allocate new space for its output
>>>>> because overwriting its input would affect a subsequent
>>>>>     sum(x)
>>>>> I suppose that end-users and function-writers could learn
>>>>> to live with having to decide when to copy, but not having
>>>>> to make that decision makes S more pleasant (and safer) to use.
>>>>> I think that is a major reason that people are able to
>>>>> share S code so easily.
>>>>> 
>>>>> Bill Dunlap
>>>>> Spotfire, TIBCO Software
>>>>> wdunlap tibco.com
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: oliver [mailto:oli...@first.in-berlin.de]
>>>>>> Sent: Tuesday, March 06, 2012 1:12 AM
>>>>>> To: William Dunlap
>>>>>> Cc: Hervé Pagès; R-devel
>>>>>> Subject: Re: [Rd] Julia
>>>>>> 
>>>>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote:
>>>>>> [...]
>>>>>>> I find R's (&   S+'s&   S's) 
>>>>>>> copy-on-write-if-not-copying-would-be-discoverable-
>>>>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way
>>>>>>> to structure the contract between the function writer and the function 
>>>>>>> user.
>>>>>> [...]
>>>>>> 
>>>>>> 
>>>>>> Can you elaborate more on this,
>>>>>> especially on the 
>>>>>> ...-...-...-if-not-copying-would-be-discoverable-by-the-uer
>>>>>> stuff?
>>>>>> 
>>>>>> What do you mean with discoverability of not-copying?
>>>>>> 
>>>>>> Ciao,
>>>>>>    Oliver
>>>> 
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>>> --
>>> Hervé Pagès
>>> 
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>> 
>>> E-mail: hpa...@fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R's copying of arguments (Re: Julia)

Reply via email to