[R] Reading a file line by line - separating lines VS separating columns

Tal Galili Wed, 18 Mar 2009 16:20:23 -0700

Hello all.

I wish to read a large data set into R.  My current issue is in getting the
data so that R would be able to access it.  Using read.table won't work
since the data is over 1GB in size (and I am using windows XP), so my plan
was to read the file chunk by chunk and each time move it into bigmemory
(I'll play with that when the time will come, maybe ff is better ?!).


I encountered a problem with separating lines VS separating columns, to
which I found a solution but it doesn't feel to be a smart solution, any
ideas or help of how to improve this would be welcomed.



# sample code:

# creating a simple file zz <- file("ex.data", "w") # open an output file
connection cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
"\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
"\n") cat( "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t\t555\t\t", file = zz, sep =
"\n") (temp.file = scan("ex.data", what = "", sep = "\n")) # here we can
limit the amount of rows we want to use and start from a specific row using
skip # or: #(aa = readLines("ex.data")) str(aa) # we get a vector of
character new.df <- NULL # we go through the vector to split the columns
for(i in 1:length(aa)) { new.df <- rbind(new.df
,unlist(strsplit(temp.file[i], "\t"))) } new.df # or maybe
apply(as.data.frame(temp.file), 1, function(b) unlist(strsplit(b, "\t"))) #
but this transposes the matrix


Thanks,
Tal


-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading a file line by line - separating lines VS separating columns

Reply via email to