On 3/25/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > [Moved to R-devel for a technical comment.] > > On Fri, 9 Mar 2007, hadley wickham wrote: > > >> I would like to know if there is a way to create a list or an array (or > >> anything) which grows automatically as more elements are put into it. What > >> I > >> want to find is something equivalent to an ArrayList object of Java > >> language. In Java, I can do the following thing: > >> > >> // Java code > >> ArrayList myArray = new ArrayList(); > >> myArray.add("object1"); > >> myArray.add("object2"); > >> .... > >> // End of java code > > > > As others have mentioned, you can do this with lists in R. > > > > However, there is an important difference between ArrayLists in Java > > and Lists in R. In Java, when an ArrayList grows past its bound, it > > doesn't allocate just enough space, it allocates a lot more, so the > > next time you allocate past the end of the array, there's space > > already reserved. This gives (IIRC) amortised O(n) behaviour. R > > doesn't do this however, so has to copy the entire array every time > > giving O(n^2) behaviour. > > In fact this is an implementation detail. R has both 'length' and > 'truelength' fields in its headers for vectors (including lists) and could > grow the allocation in the same way as you report Java does. When I asked > Ross what the intention had been (the 'truelength' field is almost unused) > he mentioned this potential usage. Given that these structures are opaque > to all but R internal code it should not be hard to change R's scheme to > over-allocate: to decide how much to do would be harder (but say rounding > vectors in the large allocation class up to a VM page would get a > noticeable benefit in some usages with a negligible impact on memory > footprint). Backwards compatibility of save() format would be an issue.
Interesting - it would be fascinating (but difficult!) to find out how much such a change would affect the average running time of all R code. > It seems the really inefficient uses are of the type > > x <- NULL > for(i in 1:10000) x <- c(x, fn(i)) > > and those would be unaltered. Does R do any manipulations of the AST after the code has been parsed? If it did, wouldn't it be fairly easy to recognise this idiom and optimise it to x[length(x) + 1] <- fn(i) ? It has something of the flavour of tail recursion optimisation. Maybe this is something that Luke is working on in his byte-code compiler. Hadley ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel