Hello all. I wish to read a large data set into R. My current issue is in getting the data so that R would be able to access it. Using read.table won't work since the data is over 1GB in size (and I am using windows XP), so my plan was to read the file chunk by chunk and each time move it into bigmemory (I'll play with that when the time will come, maybe ff is better ?!).
I encountered a problem with separating lines VS separating columns, to which I found a solution but it doesn't feel to be a smart solution, any ideas or help of how to improve this would be welcomed. # sample code: # creating a simple file zz <- file("ex.data", "w") # open an output file connection cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = "\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep = "\n") (temp.file = scan("ex.data", what = "", sep = "\n")) # here we can limit the amount of rows we want to use and start from a specific row using skip # or: #(aa = readLines("ex.data")) str(aa) # we get a vector of character new.df <- NULL # we go through the vector to split the columns for(i in 1:length(aa)) { new.df <- rbind(new.df ,unlist(strsplit(temp.file[i], "\t"))) } new.df # or maybe apply(as.data.frame(temp.file), 1, function(b) unlist(strsplit(b, "\t"))) # but this transposes the matrix Thanks, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.