Thanks, Ben. The matrix is a pure numeric matrix (6x700000, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim()
Here is the time spent on reading the matrix for each test. >system.time( tmp <- read.delim("test_data.txt")) user system elapsed 50985.421 27.665 51013.384 >system.time(tmp <- >read.delim("test_data.txt",colClasses="numeric",nrows=7,comment.char="")) user system elapsed 51301.563 60.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -----Original Message----- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: > Hi, > > > > I am trying to read a tab-delimited file into R (Ver. 2.8). The > machine I am using is 64bit Linux with 16 GB. > > The file is basically a matrix(~600x700000) and as large as 3GB. > > > > The read.delim() ran extremely slow (hours) even with a subset of > the file (31 MB with 6x700000) > > I monitored the memory usage, and found it constantly only took less > than 1% of 16GB memory. > > Does read.delim() have difficulty to read files with lots of columns? > > Any suggestions? > > > > Thanks, > > Mike > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.