Barry Rowlingson wrote: > On Wed, May 13, 2009 at 5:36 PM, Wacek Kusnierczyk > <waclaw.marcin.kusnierc...@idi.ntnu.no> wrote: >> Barry Rowlingson wrote: > >>> Soln - "for" loop: >>> >>> > z=list() >>> > for(i in 1:1000){z[[i]]=rnorm(100,0,1)} >>> >>> now inspect the individual bits: >>> >>> > hist(z[[1]]) >>> > hist(z[[545]]) >>> >>> If that's the problem, then I suggest she reads an introduction to R...
>> i'd suggest reading the r inferno by pat burns [1], where he deals with >> this sort of for-looping lists the way it deserves ;) > > I don't think extending a list this way is too expensive. Not like indeed, for some, but only for some, values of m and n, it can actually be half a hair faster than the matrix and the replicate approaches, proposed earlier by others: n=100; m=100 library(rbenchmark) benchmark(replications=100, columns=c('test', 'elapsed'), order=NULL, list={ l=list(); for (i in 1:n) l[[i]] = rnorm(m) }, liist={ l=vector('list', n); for (i in 1:n) l[[i]] = rnorm(m) }, matrix=matrix(rnorm(n*m), n, m), replicate=replicate(m, rnorm(n)) ) # test elapsed # 1 list 0.247 # 2 liist 0.235 # 3 matrix 0.172 # 4 replicate 0.247 n=10; m=1000 ... # test elapsed # 1 list 0.169 # 2 liist 0.169 # 3 matrix 0.173 # 4 replicate 0.771 n=1000; m=10 ... # test elapsed # 1 list 1.428 # 2 liist 0.902 # 3 matrix 0.172 # 4 replicate 0.185 but note that when the number of replicate m-samples is sufficiently large (relative to the sample size), the list approach is orders of magnitude (here, one order in the last case above) less efficient than the other approaches. (preallocation seems only partially helpful.) which approach to choose -- if efficiency is an issue -- would be a matter of the actual parameter values. > doing 1000 foo=rbind(foo,bar)s to a matrix. The overhead for extending > a list should really only be adding a single new pointer to the list > pointer structure. The existing list data isn't copied. i don't think it's accurate, as far as i understand lists in r. while a list is, internally, holding just pointers, and extending a list does not cause the pointed-to structures to be reallocated, the list itself is backed with a fixed-length array (sensu c arrays), not as a pairlist, so it needs to be reallocated. you don't rewrite the values, but you do rewrite the pointers. on the other hand, an appropriate implementation of pairlists would indeed allow you to extend a (pair)list in O(1) time. benchmark(columns=c('test', 'elapsed'), list={ l = list(); for (i in 1:1000) l[[i]] = 0 }, pairlist={ p = pairlist(); for (i in 1:1000) p[[i]] = 0 }) # test elapsed # 1 list 0.743 # 2 pairlist 0.523 the result is, of course, dependent on the actual implementation details (dynamic arrays, pairlists with cached end pointer and size, etc.) you may want to use the profiler (i haven't), but my guess is that extending an r list does require the pointer data (though not the pointed-to data) to be rewritten -- because it is a fixed-length c array, while extending a pairlist does require list traversal (because, presumably, there is no end pointer cache; as far as i could see in a quick look at the sources, pairlists are represented with chained SEXPs, which have previous/next pointers, but no end-of-list pointers, hence the traversal). for example: n = 100000 benchmark(columns=c('test', 'elapsed'), order=NULL, short.control={ l = list() }, short={ l = list(); l[[1]] = 0 }, long.control={ l = vector('list', n) }, long={ l = vector('list', n); l[[n+1]] = 0 }) # test elapsed # 1 short.control 0.001 # 2 short 0.001 # 3 long.control 0.027 # 4 long 0.138 apparently, extending a short list by one is much more efficient than extending a long list. likewise: n = 100000 benchmark(columns=c('test', 'elapsed'), order=NULL, short.control={ l = pairlist() }, short={ p = pairlist(); l[[1]] = 0 }, long.control={ p = as.pairlist(1:n) }, long={ p = as.pairlist(1:n); p[[n+1]] = 0 }) # test elapsed # 1 short.control 0.001 # 2 short 0.001 # 3 long.control 2.665 # 4 long 3.002 apparently, extending a short pairlist by one element is much more efficient than extending a long one -- possibly due to the cost of traversal. reading the elements of a list is O(1) time (because it is array-backed), while reading the elements of a pairlist is much more costly even for the first ones. a look at the sources, or a comment from an r-corer, would provide further details and/or corrections to my naive guesses. > > Plus lists are more flexible. You can do: > > z=list() > for(i in 1:1000){ > z[[i]]=rnorm(i,0,1) # generate 'i' samples > } > > and then you can see how the properties of samples of rnorm differ > with increasing numbers of samples. yes, but here you solve a different problem. and here again there is an alternative solution -- equally 'flexible', slightly faster, much cleaner: n = 1000 benchmark(columns=c('test', 'elapsed'), order=NULL, 'for'={ l = list(); for (i in 1:n) l[[i]] = rnorm(i, 0, 1) }, lapply=lapply(1:n, rnorm, 0, 1) ) # test elapsed # 1 for 9.855 # 2 lapply 8.923 > Yes, you can probably vectorize this with lapply or something, but I > prefer clarity over concision when dealing with beginners... but where's the preferred clarity in the for loop solution? vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.