> From: Simon Urbanek > > On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote: > > > From: Liaw, Andy > >> > >> From: Prof Brian Ripley > >>> > >>> On Fri, 27 Aug 2010, peter dalgaard wrote: > >>> > >>>> > >>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote: > >>>> > >>>>> I'd very much appreciate guidance on this. A user > >>> reported that the > >>>>> as.double() coercion used inside the .C() call for a > >> function in my > >>>>> package (specifically, randomForest:::predict.randomForest()) is > >>>>> taking up significant amount of time when called repeatedly, and > >>>>> Removing some of these reduced run time by 30-40% in some cases. > >>>>> These arguments are components of the fitted model (thus do not > >>>>> change), and are matrices. Some basic tests show no > >> difference in > >>>>> The result when the coercions are removed (other than > >>> faster run time). > >>>>> What I like to know is whether this is safe to do, or is > >>> it likely to > >>>>> lead > >>>>> to trouble in the future? > >>>> > >>>> In a word: yes. It is safe as long as you are absolutely > >> sure that > >>>> the argument has the right mode. The unsafeness comes in > >>> when people > >>>> might unwittingly use, say, an integer vector where a double was > >>>> expected, causing memory overruns and general mayhem. > >>>> > >>>> Notice, BTW, that if you switch to .Call or .External, then > >>> you have > >>>> much more scope for handling such details on the C-side. > E.g. you > >>>> could coerce only if the object has the wrong mode, avoid > >>>> duplicating things you won't be modifying anyway, etc. > >>> > >>> But as as.double is effectively .Call it has the same > >> freedom, and it > >>> does nothing if no coercion is required. The crunch here is > >>> likely to > >>> be > >>> > >>> 'as.double' attempts to coerce its argument to be of > >>> double type: > >>> like 'as.vector' it strips attributes including names. > >>> (To ensure > >>> that an object is of double type without stripping > >>> attributes, use > >>> 'storage.mode'.) > >>> > >>> I suspect the issue is the copying to remove attributes, in > >> which case > >> > >> I can certainly believe this. I've tried replacing > >> as.double() to c(), thinking attributes need to be stripped. > >> That actually increased run time very slightly instead of > reducing it. > >> > >>> storage.mode(x) <- "double" > >>> > >>> should be a null op and so both fast and safe. > >> > >> Will follow this advise. Thanks to both of you for the help! > > > > My apologies for coming back to this so late. I did some > test, and found that > > > > storage.mode(x) <- "double" > > > > isn't as low on resource as I thought it might be. > Changing the code to this from > > > > x <- as.double(x) > > > > did not give the expected speed improvement. Here's a little test: > > > > f1 <- function(x) { as.double(x); NULL } > > f2 <- function(x) { storage.mode(x) <- "double"; NULL } > > f3 <- function(x) { x <- x; NULL } > > set.seed(917) > > reps <- 500 > > x <- matrix(rnorm(1e6L), 1e3L, 1e3L) > > system.time(junk <- replicate(reps, f1(x))) > > system.time(junk <- replicate(reps, f2(x))) > > system.time(junk <- replicate(reps, f3(x))) > > > > On my laptop running R 2.11.1 Patched (2010-06-26 r52410), I get: > > > > R> system.time(junk <- replicate(reps, f1(x))) > > user system elapsed > > 3.54 2.14 5.74 > > R> system.time(junk <- replicate(reps, f2(x))) > > user system elapsed > > 3.32 2.11 5.92 > > R> system.time(junk <- replicate(reps, f3(x))) > > user system elapsed > > 0 0 0 > > > > Perhaps I need to first check and see if the storage mode > is as expected before trying coercion? > > > > Well, the devil is in the details. Although storage.mode<- is > a noop itself, the issue is that it does trigger duplication > because it is an assignment, not because storage mode would > change anything. Technically, x <- x is a special case which > is truly a noop whereas any call `foo<-` has to assume > modification. So, yes, in your case > f4 <- function(x) { if (storage.mode(x) != "double") > storage.mode(x) <- "double"; NULL } > will have the same speed as f3. If you are going in to .Call > then you could as well do that in the C side (with the > benefit of being able to strip attributes since you can get > them from the original object if you care...). > > Cheers, > Simon
Thanks a lot, Simon, for the clarification. Unfortunately I'm not using .Call(), but .C() with DUP=FALSE, and it's exactly the duplication that I'm trying to avoid. For now I just inserted tests (is.double() and is.integer()) and only do the coercion if needed, prior to the .C() call. That gives the speed up that I was expecting. To do this more cleanly, I really need to learn .Call()... Best, Andy > > > Best, > > Andy > > > > > > > >> Best, > >> Andy > >> > >> > >>> -- > >>> Brian D. Ripley, rip...@stats.ox.ac.uk > >>> Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/ > >>> University of Oxford, Tel: +44 1865 272861 (self) > >>> 1 South Parks Road, +44 1865 272866 (PA) > >>> Oxford OX1 3TG, UK Fax: +44 1865 272595 > >>> > >> Notice: This e-mail message, together with any > attachments, contains > >> information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, > >> New Jersey, USA 08889), and/or its affiliates Direct contact > >> information > >> for affiliates is available at > >> http://www.merck.com/contact/contacts.html) that may be > confidential, > >> proprietary copyrighted and/or legally privileged. It is > >> intended solely > >> for the use of the individual or entity named on this > >> message. If you are > >> not the intended recipient, and have received this message > in error, > >> please notify us immediately by reply e-mail and then > delete it from > >> your system. > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > Notice: This e-mail message, together with any > attachments, contains > > information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, > > New Jersey, USA 08889), and/or its affiliates Direct > contact information > > for affiliates is available at > > http://www.merck.com/contact/contacts.html) that may be > confidential, > > proprietary copyrighted and/or legally privileged. It is > intended solely > > for the use of the individual or entity named on this > message. If you are > > not the intended recipient, and have received this message in error, > > please notify us immediately by reply e-mail and then > delete it from > > your system. > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel