Patrick Burns wrote: > Wacek Kusnierczyk wrote: >> Patrick Burns wrote: >> >>> If the goal is to "look" professional, then >>> 'replicate' probably suits. If the goal is to >>> compute as fast as possible, then that isn't >>> the case because 'replicate' is really a 'for' >>> loop in disguise and there are other ways. >>> >>> Here's one other way: >>> >>> function (size, replicates, distfun, ...) >>> { >>> >>> colMeans(array(distfun(size * replicates, ...), c(size, >>> replicates))) >>> } >>> >> >> a naive benchmark: >> >> f.rep = function(n, m) replicate(n, rnorm(m)) >> f.pat = function(n, m) colMeans(array(rnorm(n*m), c(n, m))) >> >> system.time(f.pat(1000, 1000)) >> system.time(f.rep(1000, 1000)) >> >> makes me believe that there is no significant difference in efficiency >> between the 'professionally-looking' replicate-based solution and the >> 'as fast as possible' pat's solution. >> > > I think Wacek is largely correct. First off, a correction: > the dimensions on the array if 'f.pat' should be c(m, n) > rather than c(n, m).
... and the benchmark was unfair, in that wacek forgot to take means in f.rep. which, as far as i can see, does not change the results substantially. > > What I'm seeing on my machine is that the array trick seems > always to be a bit faster, but only substantially faster if 'm' > (that is, the number being summed) is smallish. the results will in general depend on the m:n ratio. i have specifically picked the valuies to show it's not *necessarily* the case that f.pat is much faster than f.rep, even if in some cases it might be. with n sufficiently larger than m, say n:m = 10^5, pat's solution is indeed much faster, while the opposite (n:m = 10^-5) leads to equal performance. so you'd be justified to say that f.pat is no worse than f.rep in the general case. > > That makes sense: loops are "slow" because of the overhead > of doing the calling. When each call takes a lot of time, > the overhead becomes insignificant. haven't examined the sources and thus unsure, but my guess is that part of the overhead lies in that replicate calls sapply, sapply calls lapply, with additional testing, and then lapply calls internal lapply. (it might be that the for-looping is actually done in c and is not guilty -- correct me if i'm severely off.) the point of the exercise was to give a humble hint: if you provide an advice such as the one discussed here, it will, in general, be helpful to hint in which cases it is worth using it instead of an arguably more code-elegant one. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.