Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka
> Date: Wed, 25 May 2011 12:32:37 -0400 > Subject: Re: [R] Processing large datasets > From: mailinglist.honey...@gmail.com > To: marchy...@hotmail.com > CC: ro...@bestroman.com; r-help@r-project.org > > Hi, > > On We

Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi, On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka wrote: [snip] >> > If your datasets are *really* huge, check out some packages listed >> > under the "Large memory and out-of-memory data" section of the >> > "HighPerformanceComputing" task view at CRAN: >> >> > http://cran.r-project.org/web/v

Re: [R] Processing large datasets

2011-05-25 Thread Hugo Mildenberger
With PostgreSQL at least, R can also be used as implementation language for stored procedures. Hence data transfers between processes can be avoided alltogether. http://www.joeconway.com/plr/ Implemention of such a procedure in R appears to be straighforward: CREATE OR REPLACE FUNCTION

Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka
> Date: Wed, 25 May 2011 10:18:48 -0400 > From: ro...@bestroman.com > To: mailinglist.honey...@gmail.com > CC: r-help@r-project.org > Subject: Re: [R] Processing large datasets > > > Hi, > > If your datasets are *really*

Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.

2011-05-25 Thread Mike Marchywka
> Date: Wed, 25 May 2011 09:49:00 -0400 > From: ro...@bestroman.com > To: biomathjda...@gmail.com > CC: r-help@r-project.org > Subject: Re: [R] Processing large datasets > > Thanks Jonathan. > > I'm already using RMySQL

Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi, On Wed, May 25, 2011 at 10:18 AM, Roman Naumenko wrote: [snip] > I don't think data.table is fundamentally different from data.frame type, but > thanks for the suggestion. > > http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf > "Just like data.frames, data.table

Re: [R] Processing large datasets

2011-05-25 Thread Marc Schwartz
Take a look at the High-Performance and Parallel Computing with R CRAN Task View: http://cran.us.r-project.org/web/views/HighPerformanceComputing.html specifically at the section labeled "Large memory and out-of-memory data". There are some specific R features that have been implemented in a

Re: [R] Processing large datasets

2011-05-25 Thread Roman Naumenko
Thanks Jonathan. I'm already using RMySQL to load data for couple of days. I wanted to know what are the relevant R capabilities if I want to process much bigger tables. R always reads the whole set into memory and this might be a limitation in case of big tables, correct? Doesn't it use te

Re: [R] Processing large datasets

2011-05-25 Thread Roman Naumenko
> Hi, > On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko > wrote: > > Hi R list, > > > > I'm new to R software, so I'd like to ask about it is capabilities. > > What I'm looking to do is to run some statistical tests on quite > > big > > tables which are aggregated quotes from a market feed. > >

Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi, On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko wrote: > Hi R list, > > I'm new to R software, so I'd like to ask about it is capabilities. > What I'm looking to do is to run some statistical tests on quite big > tables which are aggregated quotes from a market feed. > > This is a typical se

Re: [R] Processing large datasets

2011-05-25 Thread Jonathan Daily
In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that

[R] Processing large datasets

2011-05-24 Thread Roman Naumenko
Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered).