Hello, Do you need /all/ the data in memory at one time? Is your goal to divide the data (e.g according to some factor /or/ some function of the columns of data set ) and then analyze the divisions? And then, possibly, combine the results ? If so, you might consider using Rhipe. We have analyzed (e.g get regression parameters, apply algorithms) across subsets of data where the subsets are created according to some condition. Using this approach(and a cluster of 8 machines, 72 cores) we successfully analyzed data sets ranging from 14GB to ~140GB . This all assumes that your divisions are suitably small - i notice you mention that each region is 10-20 GB and you want to compute on /all/ i.e you need all of it in memory. If so, Rhipe cannot help you.
Regards Saptarshi On Thu, Feb 4, 2010 at 8:27 PM, Vadlamani, Satish {FLNA} <satish.vadlam...@fritolay.com> wrote: > Folks: > I am trying to read in a large file. Definition of large is: > Number of lines: 333, 250 > Size: 850 MB > > The maching is a dual core intel, with 4 GB RAM and nothing else running on > it. I read the previous threads on read.fwf and did not see any conclusive > statements on how to read fast. Example record and R code given below. I was > hoping to purchase a better machine and do analysis with larger datasets - > but these preliminary results do not look good. > > Does anyone have any experience with large files (> 1GB) and using them with > Revolution-R? > > > Thanks. > > Satish > > Example Code > key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9) > key_names <- > c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc") > key_info <- data.frame(key_vec,key_names) > col_names <- c(key_names,sas_time$week) > num_buckets <- rep(12,209) > width_vec = c(key_vec,num_buckets) > col_classes<-c(rep("factor",18),rep("numeric",209)) > #threewkoutstat <- > read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100) > threewkoutstat <- > read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes) > names(threewkoutstat) <- col_names > > Example record (only one record pasted below) > A004001003799000049250000492599990049999A001002002015002015009 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 ! > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.60 > 0.60 0.60 0.70 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 ! > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.