Re: [Rd] [R] How to create a list that grows automatically
[Moved to R-devel for a technical comment.] On Fri, 9 Mar 2007, hadley wickham wrote: >> I would like to know if there is a way to create a list or an array (or >> anything) which grows automatically as more elements are put into it. What I >> want to find is something equivalent to an ArrayList object of Java >> language. In Java, I can do the following thing: >> >> // Java code >> ArrayList myArray = new ArrayList(); >> myArray.add("object1"); >> myArray.add("object2"); >> >> // End of java code > > As others have mentioned, you can do this with lists in R. > > However, there is an important difference between ArrayLists in Java > and Lists in R. In Java, when an ArrayList grows past its bound, it > doesn't allocate just enough space, it allocates a lot more, so the > next time you allocate past the end of the array, there's space > already reserved. This gives (IIRC) amortised O(n) behaviour. R > doesn't do this however, so has to copy the entire array every time > giving O(n^2) behaviour. In fact this is an implementation detail. R has both 'length' and 'truelength' fields in its headers for vectors (including lists) and could grow the allocation in the same way as you report Java does. When I asked Ross what the intention had been (the 'truelength' field is almost unused) he mentioned this potential usage. Given that these structures are opaque to all but R internal code it should not be hard to change R's scheme to over-allocate: to decide how much to do would be harder (but say rounding vectors in the large allocation class up to a VM page would get a noticeable benefit in some usages with a negligible impact on memory footprint). Backwards compatibility of save() format would be an issue. It seems the really inefficient uses are of the type x <- NULL for(i in 1:1) x <- c(x, fn(i)) and those would be unaltered. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] How to create a list that grows automatically
On 3/25/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > [Moved to R-devel for a technical comment.] > > On Fri, 9 Mar 2007, hadley wickham wrote: > > >> I would like to know if there is a way to create a list or an array (or > >> anything) which grows automatically as more elements are put into it. What > >> I > >> want to find is something equivalent to an ArrayList object of Java > >> language. In Java, I can do the following thing: > >> > >> // Java code > >> ArrayList myArray = new ArrayList(); > >> myArray.add("object1"); > >> myArray.add("object2"); > >> > >> // End of java code > > > > As others have mentioned, you can do this with lists in R. > > > > However, there is an important difference between ArrayLists in Java > > and Lists in R. In Java, when an ArrayList grows past its bound, it > > doesn't allocate just enough space, it allocates a lot more, so the > > next time you allocate past the end of the array, there's space > > already reserved. This gives (IIRC) amortised O(n) behaviour. R > > doesn't do this however, so has to copy the entire array every time > > giving O(n^2) behaviour. > > In fact this is an implementation detail. R has both 'length' and > 'truelength' fields in its headers for vectors (including lists) and could > grow the allocation in the same way as you report Java does. When I asked > Ross what the intention had been (the 'truelength' field is almost unused) > he mentioned this potential usage. Given that these structures are opaque > to all but R internal code it should not be hard to change R's scheme to > over-allocate: to decide how much to do would be harder (but say rounding > vectors in the large allocation class up to a VM page would get a > noticeable benefit in some usages with a negligible impact on memory > footprint). Backwards compatibility of save() format would be an issue. Interesting - it would be fascinating (but difficult!) to find out how much such a change would affect the average running time of all R code. > It seems the really inefficient uses are of the type > > x <- NULL > for(i in 1:1) x <- c(x, fn(i)) > > and those would be unaltered. Does R do any manipulations of the AST after the code has been parsed? If it did, wouldn't it be fairly easy to recognise this idiom and optimise it to x[length(x) + 1] <- fn(i) ? It has something of the flavour of tail recursion optimisation. Maybe this is something that Luke is working on in his byte-code compiler. Hadley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel