Re: [Rd] [R] How to create a list that grows automatically

2007-03-25 Thread Prof Brian Ripley
[Moved to R-devel for a technical comment.]

On Fri, 9 Mar 2007, hadley wickham wrote:

>> I would like to know if there is a way to create a list or an array (or
>> anything) which grows automatically as more elements are put into it. What I
>> want to find is something equivalent to an ArrayList object of Java
>> language. In Java, I can do the following thing:
>>
>> // Java code
>> ArrayList myArray = new ArrayList();
>> myArray.add("object1");
>> myArray.add("object2");
>> 
>> // End of java code
>
> As others have mentioned, you can do this with lists in R.
>
> However, there is an important difference between ArrayLists in Java
> and Lists in R.  In Java, when an ArrayList grows past its bound, it
> doesn't allocate just enough space, it allocates a lot more, so the
> next time you allocate past the end of the array, there's space
> already reserved.  This gives (IIRC) amortised O(n) behaviour.  R
> doesn't do this however, so has to copy the entire array every time
> giving O(n^2) behaviour.

In fact this is an implementation detail.  R has both 'length' and 
'truelength' fields in its headers for vectors (including lists) and could 
grow the allocation in the same way as you report Java does.  When I asked 
Ross what the intention had been (the 'truelength' field is almost unused) 
he mentioned this potential usage.  Given that these structures are opaque 
to all but R internal code it should not be hard to change R's scheme to 
over-allocate: to decide how much to do would be harder (but say rounding 
vectors in the large allocation class up to a VM page would get a 
noticeable benefit in some usages with a negligible impact on memory 
footprint).  Backwards compatibility of save() format would be an issue.

It seems the really inefficient uses are of the type

x <- NULL
for(i in 1:1) x <- c(x, fn(i))

and those would be unaltered.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] How to create a list that grows automatically

2007-03-25 Thread hadley wickham
On 3/25/07, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
> [Moved to R-devel for a technical comment.]
>
> On Fri, 9 Mar 2007, hadley wickham wrote:
>
> >> I would like to know if there is a way to create a list or an array (or
> >> anything) which grows automatically as more elements are put into it. What 
> >> I
> >> want to find is something equivalent to an ArrayList object of Java
> >> language. In Java, I can do the following thing:
> >>
> >> // Java code
> >> ArrayList myArray = new ArrayList();
> >> myArray.add("object1");
> >> myArray.add("object2");
> >> 
> >> // End of java code
> >
> > As others have mentioned, you can do this with lists in R.
> >
> > However, there is an important difference between ArrayLists in Java
> > and Lists in R.  In Java, when an ArrayList grows past its bound, it
> > doesn't allocate just enough space, it allocates a lot more, so the
> > next time you allocate past the end of the array, there's space
> > already reserved.  This gives (IIRC) amortised O(n) behaviour.  R
> > doesn't do this however, so has to copy the entire array every time
> > giving O(n^2) behaviour.
>
> In fact this is an implementation detail.  R has both 'length' and
> 'truelength' fields in its headers for vectors (including lists) and could
> grow the allocation in the same way as you report Java does.  When I asked
> Ross what the intention had been (the 'truelength' field is almost unused)
> he mentioned this potential usage.  Given that these structures are opaque
> to all but R internal code it should not be hard to change R's scheme to
> over-allocate: to decide how much to do would be harder (but say rounding
> vectors in the large allocation class up to a VM page would get a
> noticeable benefit in some usages with a negligible impact on memory
> footprint).  Backwards compatibility of save() format would be an issue.

Interesting - it would be fascinating (but difficult!) to find out how
much such a change would affect the average running time of all R
code.

> It seems the really inefficient uses are of the type
>
> x <- NULL
> for(i in 1:1) x <- c(x, fn(i))
>
> and those would be unaltered.

Does R do any manipulations of the AST after the code has been parsed?
 If it did, wouldn't it be fairly easy to recognise this idiom and
optimise it to x[length(x) + 1] <- fn(i) ?  It has something of the
flavour of tail recursion optimisation.  Maybe this is something that
Luke is working on in his byte-code compiler.

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel