You can do something like this using connections and read in a set of lines and saving the results in bigmemory, or in this case a 'save' image:
zz <- file("ex.data", "w") # open an output file for (i in 1:10000)cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep ="\n") close(zz) # read in the data 876 lines at a time and write out an image zz <- file("ex.data", "r") fileNo <- 1 repeat{ gotError <- 1 # set to 2 if there is an error # catch the error if not more data tryCatch(input <- read.table(zz, nrows=876, sep='\t'), error=function(x) gotError <<- 2) if (gotError == 2) break # save the intermediate data save(input, file=sprintf("file%03d.RDData", fileNo)) fileNo <- fileNo + 1 } close(zz) On Wed, Mar 18, 2009 at 7:17 PM, Tal Galili <tal.gal...@gmail.com> wrote: > Hello all. > > I wish to read a large data set into R. My current issue is in getting the > data so that R would be able to access it. Using read.table won't work > since the data is over 1GB in size (and I am using windows XP), so my plan > was to read the file chunk by chunk and each time move it into bigmemory > (I'll play with that when the time will come, maybe ff is better ?!). > > I encountered a problem with separating lines VS separating columns, to > which I found a solution but it doesn't feel to be a smart solution, any > ideas or help of how to improve this would be welcomed. > > > > # sample code: > > # creating a simple file zz <- file("ex.data", "w") # open an output file > connection cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = > "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = > "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = > "\n") (temp.file = scan("ex.data", what = "", sep = "\n")) # here we can > limit the amount of rows we want to use and start from a specific row using > skip # or: #(aa = readLines("ex.data")) str(aa) # we get a vector of > character new.df <- NULL # we go through the vector to split the columns > for(i in 1:length(aa)) { new.df <- rbind(new.df > ,unlist(strsplit(temp.file[i], "\t"))) } new.df # or maybe > apply(as.data.frame(temp.file), 1, function(b) unlist(strsplit(b, "\t"))) # > but this transposes the matrix > > > Thanks, > Tal > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > http://www.r-statistics.com/ > http://www.talgalili.com > http://www.biostatistics.co.il > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.