On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote: > From: Liaw, Andy >> >> From: Prof Brian Ripley >>> >>> On Fri, 27 Aug 2010, peter dalgaard wrote: >>> >>>> >>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote: >>>> >>>>> I'd very much appreciate guidance on this. A user >>> reported that the >>>>> as.double() coercion used inside the .C() call for a >> function in my >>>>> package (specifically, randomForest:::predict.randomForest()) is >>>>> taking up significant amount of time when called repeatedly, and >>>>> Removing some of these reduced run time by 30-40% in some cases. >>>>> These arguments are components of the fitted model (thus do not >>>>> change), and are matrices. Some basic tests show no >> difference in >>>>> The result when the coercions are removed (other than >>> faster run time). >>>>> What I like to know is whether this is safe to do, or is >>> it likely to >>>>> lead >>>>> to trouble in the future? >>>> >>>> In a word: yes. It is safe as long as you are absolutely >> sure that >>>> the argument has the right mode. The unsafeness comes in >>> when people >>>> might unwittingly use, say, an integer vector where a double was >>>> expected, causing memory overruns and general mayhem. >>>> >>>> Notice, BTW, that if you switch to .Call or .External, then >>> you have >>>> much more scope for handling such details on the C-side. E.g. you >>>> could coerce only if the object has the wrong mode, avoid >>>> duplicating things you won't be modifying anyway, etc. >>> >>> But as as.double is effectively .Call it has the same >> freedom, and it >>> does nothing if no coercion is required. The crunch here is >>> likely to >>> be >>> >>> ‘as.double’ attempts to coerce its argument to be of >>> double type: >>> like ‘as.vector’ it strips attributes including names. >>> (To ensure >>> that an object is of double type without stripping >>> attributes, use >>> ‘storage.mode’.) >>> >>> I suspect the issue is the copying to remove attributes, in >> which case >> >> I can certainly believe this. I've tried replacing >> as.double() to c(), thinking attributes need to be stripped. >> That actually increased run time very slightly instead of reducing it. >> >>> storage.mode(x) <- "double" >>> >>> should be a null op and so both fast and safe. >> >> Will follow this advise. Thanks to both of you for the help! > > My apologies for coming back to this so late. I did some test, and found that > > storage.mode(x) <- "double" > > isn't as low on resource as I thought it might be. Changing the code to this > from > > x <- as.double(x) > > did not give the expected speed improvement. Here's a little test: > > f1 <- function(x) { as.double(x); NULL } > f2 <- function(x) { storage.mode(x) <- "double"; NULL } > f3 <- function(x) { x <- x; NULL } > set.seed(917) > reps <- 500 > x <- matrix(rnorm(1e6L), 1e3L, 1e3L) > system.time(junk <- replicate(reps, f1(x))) > system.time(junk <- replicate(reps, f2(x))) > system.time(junk <- replicate(reps, f3(x))) > > On my laptop running R 2.11.1 Patched (2010-06-26 r52410), I get: > > R> system.time(junk <- replicate(reps, f1(x))) > user system elapsed > 3.54 2.14 5.74 > R> system.time(junk <- replicate(reps, f2(x))) > user system elapsed > 3.32 2.11 5.92 > R> system.time(junk <- replicate(reps, f3(x))) > user system elapsed > 0 0 0 > > Perhaps I need to first check and see if the storage mode is as expected > before trying coercion? >
Well, the devil is in the details. Although storage.mode<- is a noop itself, the issue is that it does trigger duplication because it is an assignment, not because storage mode would change anything. Technically, x <- x is a special case which is truly a noop whereas any call `foo<-` has to assume modification. So, yes, in your case f4 <- function(x) { if (storage.mode(x) != "double") storage.mode(x) <- "double"; NULL } will have the same speed as f3. If you are going in to .Call then you could as well do that in the C side (with the benefit of being able to strip attributes since you can get them from the original object if you care...). Cheers, Simon > Best, > Andy > > > >> Best, >> Andy >> >> >>> -- >>> Brian D. Ripley, rip...@stats.ox.ac.uk >>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >>> University of Oxford, Tel: +44 1865 272861 (self) >>> 1 South Parks Road, +44 1865 272866 (PA) >>> Oxford OX1 3TG, UK Fax: +44 1865 272595 >>> >> Notice: This e-mail message, together with any attachments, contains >> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >> New Jersey, USA 08889), and/or its affiliates Direct contact >> information >> for affiliates is available at >> http://www.merck.com/contact/contacts.html) that may be confidential, >> proprietary copyrighted and/or legally privileged. It is >> intended solely >> for the use of the individual or entity named on this >> message. If you are >> not the intended recipient, and have received this message in error, >> please notify us immediately by reply e-mail and then delete it from >> your system. >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > Notice: This e-mail message, together with any attach...{{dropped:18}} ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel