First 'wc' and readLines are doing vastly different functions. 'wc' is just
reading through the file without having to allocate memory to it;
'readLines' is actually storing the data in memory.
I have a 150MB file I was trying it on, and here is what 'wc' did on my
Windows system:
/cygdrive/c: time wc tempxx.txt
1055808 13718468 151012320 tempxx.txt
real 0m2.343s
user 0m1.702s
sys 0m0.436s
/cygdrive/c:
If I multiply that by 25 to extrapolate to a 3.5GB file, it should take
about a little less than one minute to process on my relatively slow laptop.
'readLines' on the same file takes:
> system.time(x <- readLines('/tempxx.txt'))
user system elapsed
37.82 0.47 39.23
If I extrapolate that to 3.5GB, it would take about 16 minutes. Now
considering that I only have 2GB on my system, I would not be able to read
the whole file in at once.
You never did specify what type of system you were running on and how much
memory you had. Were you 'paging' due to lack of memory?
> system.time(x <- readLines('/tempxx.txt'))
user system elapsed
37.82 0.47 39.23
> object.size(x)
84814016 bytes
On Sat, May 9, 2009 at 12:25 PM, Rob Steele <[email protected]>wrote:
> I'm finding that readLines() and read.fwf() take nearly two hours to
> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
> The unix command wc by contrast processes the same file in three
> minutes. Is there a faster way to read files in R?
>
> Thanks!
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.