On Mar 21, 2012, at 9:31 PM, Hervé Pagès wrote: > On 03/21/2012 06:23 PM, Simon Urbanek wrote: >> >> On Mar 20, 2012, at 3:08 PM, Hervé Pagès wrote: >> >>> Hi Oliver, >>> >>> On 03/17/2012 08:35 AM, oliver wrote: >>>> Hello, >>>> >>>> regarding the copying issue, >>>> I would like to point to the >>>> >>>> "Writing R-Extensions" documentation. >>>> >>>> There it is mentio9ned, that functions of extensions >>>> that use the .C interface normally do get their arguments >>>> pre-copied... >>>> >>>> >>>> In section 5.2: >>>> >>>> "There can be up to 65 further arguments giving R objects to be >>>> passed to compiled code. Normally these are copied before being >>>> passed in, and copied again to an R list object when the compiled >>>> code returns." >>>> >>>> But for the .Call and .Extension interfaces this is NOT the case. >>>> >>>> >>>> >>>> In section 5.9: >>>> "The .Call and .External interfaces allow much more control, but >>>> they also impose much greater responsibilities so need to be used >>>> with care. Neither .Call nor .External copy their arguments. You >>>> should treat arguments you receive through these interfaces as >>>> read-only." >>>> >>>> >>>> Why is read-only preferred? >>>> >>>> Please, see the discussion in section 5.9.10. >>>> >>>> It's mentioned there, that a copy of an object in the R-language >>>> not necessarily doies a real copy of that object, but instead of >>>> this, just a "rerference" to the real data is created (two names >>>> referring to one bulk of data). That's typical functional >>>> programming: not a variable, but a name (and possibly more than one >>>> name) bound to an object. >>>> >>>> >>>> Of course, if yo change the orgiginal named value, when there >>>> would be no copy of it, before changing it, then both names >>>> would refer to the changed data. >>>> of course that is not, what is wanted. >>>> >>>> But what you also can see in section 5.9.10 is, that >>>> there already is a mechanism (reference counting) that allows >>>> to distinguish between unnamed and named object. >>>> >>>> So, this is directly adressing the points you have mentioned in your >>>> examples. >>>> >>>> So, at least in principial, R allows to do in-place modifications >>>> of object with the .Call interface. >>>> >>>> You seem to refer to the .C interface, and I had explored the .Call >>>> interface. That's the reason why you may insist on "it's copyied >>>> always" and I wondered, what you were talking about, because the >>>> .Call interface allowed me rather C-like raw style of programming >>>> (and the user of it to decide, if copying will be done or not). >>>> >>>> The mechanism to descide, if copying should be done or not, >>>> also is mentioined in section 5.9.10: NAMED and SET_NAMED macros. >>>> with NAMED you can get the number of references. >>>> >>>> But later in that section it is mentioned, that - at least for now - >>>> NAMED always returns the value 2. >>>> >>>> >>>> "Currently all arguments to a .Call call will have NAMED set to 2, >>>> and so users must assume that they need to be duplicated before >>>> alteration." >>>> (section 5.9.10, last sentence) >>>> >>>> >>>> So, the in-place modification can be done already with the .Call >>>> intefcae for example. But the decision if it is safe or not >>>> is not supported at the moment. >>>> >>>> So the situation is somewhere between: "it is possible" and >>>> "R does not support a safe decision if, what is possible, also >>>> can be recommended". >>>> At the moment R rather deprecates in-place modification by default >>>> (the save way, and I agree with this default). >>>> >>>> But it's not true, that R in general copies arguments. >>>> >>>> But this seems to be true for the .C interface. >>>> >>>> Maybe a lot of performance-/memory-problems can be solved >>>> by rewriting already existing packages, by providing them >>>> via .Call instead of .C. >>> >>> My understanding is that most packages use the .C interface >>> because it's simpler to deal with and because they don't need >>> to pass complicated objects at the C level, just atomic vectors. >>> My guess is that it's probably rarely the case that the cost >>> of copying the arguments passed to .C is significant, but, >>> if that was the case, then they could always call .C() with >>> DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning >>> section in the man page). >>> >>> No need to switch to .Call >>> >> >> I strongly disagree. I'm appalled to see that sentence here. > > Come on! > >> The overhead is significant for any large vector and it is in particular >> unnecessary since in .C you have to allocate *and copy* space even for >> results (twice!). Also it is very error-prone, because you have no >> information about the length of vectors so it's easy to run out of bounds >> and there is no way to check. IMHO .C should not be used for any code >> written in this century (the only exception may be if you are passing no >> data, e.g. if all you do is to pass a flag and expect no result, you can get >> away with it even if it is more dangerous). It is a legacy interface that >> dates way back and is essentially just re-named .Fortran interface. Again, I >> would strongly recommend the use of .Call in any recent code because it is >> safer and more efficient (if you don't care about either attribute, well, >> feel free ;)). > > So aleph will not support the .C interface? ;-) >
It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;) Cheers, S > H. > >> >> Cheers, >> Simon >> >> >> >> >> >> >>> Cheers, >>> H. >>> >>>> >>>> >>>> Ciao, >>>> Oliver >>>> >>>> >>>> >>>> >>>> On Tue, Mar 06, 2012 at 04:44:49PM +0000, William Dunlap wrote: >>>>> S (and its derivatives and successors) promises that functions >>>>> will not change their arguments, so in an expression like >>>>> val<- func(arg) >>>>> you know that arg will not be changed. You can >>>>> do that by having func copy arg before doing anything, >>>>> but that uses space and time that you want to conserve. >>>>> If arg is not a named item in any environment then it >>>>> should be fine to write over the original because there >>>>> is no way the caller can detect that shortcut. E.g., in >>>>> cx<- cos(runif(n)) >>>>> the cos function does not need to allocate new space for >>>>> its output, it can just write over its input because, without >>>>> a name attached to it, the caller has no way of looking >>>>> at what runif(n) returned. If you did >>>>> x<- runif(n) >>>>> cx<- cos(x) >>>>> then cos would have to allocate new space for its output >>>>> because overwriting its input would affect a subsequent >>>>> sum(x) >>>>> I suppose that end-users and function-writers could learn >>>>> to live with having to decide when to copy, but not having >>>>> to make that decision makes S more pleasant (and safer) to use. >>>>> I think that is a major reason that people are able to >>>>> share S code so easily. >>>>> >>>>> Bill Dunlap >>>>> Spotfire, TIBCO Software >>>>> wdunlap tibco.com >>>>> >>>>>> -----Original Message----- >>>>>> From: oliver [mailto:oli...@first.in-berlin.de] >>>>>> Sent: Tuesday, March 06, 2012 1:12 AM >>>>>> To: William Dunlap >>>>>> Cc: Hervé Pagès; R-devel >>>>>> Subject: Re: [Rd] Julia >>>>>> >>>>>> On Tue, Mar 06, 2012 at 12:35:32AM +0000, William Dunlap wrote: >>>>>> [...] >>>>>>> I find R's (& S+'s& S's) >>>>>>> copy-on-write-if-not-copying-would-be-discoverable- >>>>>>> by-the-uer machanism for giving the allusion of pass-by-value a good way >>>>>>> to structure the contract between the function writer and the function >>>>>>> user. >>>>>> [...] >>>>>> >>>>>> >>>>>> Can you elaborate more on this, >>>>>> especially on the >>>>>> ...-...-...-if-not-copying-would-be-discoverable-by-the-uer >>>>>> stuff? >>>>>> >>>>>> What do you mean with discoverability of not-copying? >>>>>> >>>>>> Ciao, >>>>>> Oliver >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel