On Dec 14, 2007 1:01 PM, Barry Rowlingson <[EMAIL PROTECTED]>
wrote:

>  I have some code that can potentially produce a huge number of
> large-ish R data frames, each of a different number of rows. All the
> data frames together will be way too big to keep in R's memory, but
> we'll assume a single one is manageable. It's just when there's a
> million of them that the machine might start to burn up.
>
>  However I might, for example, want to compute some averages over the
> elements in the data frames. Or I might want to sample ten of them at
> random and do some plots. What I need is rapid random access to data
> stored in external files.
>
>  Here's some ideas I've had:
>
>  * Store all the data in an HDF-5 file - problem here is that the
> current HDF package for R reads the whole file in at once.
>
>  * Store the data in some other custom binary format with an index for
> rapid access to the N-th elements. Problems: feels like reinventing HDF,
> cross-platform issues, etc.
>
>  * Store the data in a number of .RData files in a directory. Hence to
> get the N-th element just attach(paste("foo/A-",n,'.RData')) give or
> take a parameter or two.
>
>  * Use a database. Seems a bit heavyweight, but maybe using RSQLite
> could work in order to keep it local.
>

Unless you really need this to be a general solution, I would suggest using
a database.  And if you use one that allows you to create functions within
it, you can even keep some of the calculations on the server side (which may
be a performance advantage).  If you are doing a lot of this, you might
consider Postgres and pl/R, which embeds R in the database.

Sean

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to