Hello,
Do you need /all/ the data in memory at one time? Is your goal to
divide the data (e.g according to some factor /or/ some function of
the columns of data set ) and then analyze the divisions? And then,
possibly, combine the results ?
If so, you might consider using Rhipe. We have analyzed (e.g get
regression parameters, apply algorithms) across subsets of data where
the subsets are created according to some condition.
Using this approach(and a cluster of 8 machines, 72 cores) we
successfully analyzed data sets ranging from 14GB to ~140GB .
This all assumes  that your divisions are suitably small - i notice
you mention that each region is 10-20 GB and you want to compute on
/all/ i.e you need all of it in memory. If so, Rhipe cannot help you.


Regards
Saptarshi



On Thu, Feb 4, 2010 at 8:27 PM, Vadlamani, Satish {FLNA}
<satish.vadlam...@fritolay.com> wrote:
> Folks:
> I am trying to read in a large file. Definition of large is:
> Number of lines: 333, 250
> Size: 850 MB
>
> The maching is a dual core intel, with 4 GB RAM and nothing else running on 
> it. I read the previous threads on read.fwf and did not see any conclusive 
> statements on how to read fast. Example record and R code given below. I was 
> hoping to purchase a better machine and do analysis with larger datasets - 
> but these preliminary results do not look good.
>
> Does anyone have any experience with large files (> 1GB) and using them with 
> Revolution-R?
>
>
> Thanks.
>
> Satish
>
> Example Code
> key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
> key_names <- 
> c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
> key_info <- data.frame(key_vec,key_names)
> col_names <- c(key_names,sas_time$week)
> num_buckets <- rep(12,209)
> width_vec = c(key_vec,num_buckets)
> col_classes<-c(rep("factor",18),rep("numeric",209))
> #threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
> threewkoutstat <- 
> read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
> names(threewkoutstat) <- col_names
>
> Example record (only one record pasted below)
> A004001003799000049250000492599990049999A001002002015002015009        0.00    
>     0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00   !
>      0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.60        
> 0.60        0.60        0.70        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00       !
>  0.00        0.00        0.00        0.00        0.00        0.00
>   0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00        0.00        
> 0.00        0.00        0.00        0.00        0.00        0.00        0.00  
>       0.00        0.00        0.00        0.00        0.00
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to