Rob Steele wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
> The unix command wc by contrast processes the same file in three
> minutes. Is there a faster way to read files in R?
I use statist to convert the fixed width data file into a csv file
because read.table() is considerably faster than read.fwf(). For example:
system("statist --na-string NA --xcols collist big.txt big.csv")
bigdf <- read.table(file = "big.csv", header=T, as.is=T)
The file collist is a text file whose lines contain the following
information:
variable begin end
where "variable" is the column name, and "begin" and "end" are integer
numbers indicating where in big.txt the columns begin and end.
Statist can be downloaded from: http://statist.wald.intevation.org/
--
Jakson Aquino
Social Sciences Department
Federal University of CearĂ¡, Brazil
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.