Read in as big a chunk as you can; take a look at your memory usage and make sure you environment does not have any unnecessary large objects sitting around.
On Sun, Aug 15, 2010 at 10:12 AM, Data Analytics Corp. <w...@dataanalyticscorp.com> wrote: > Hi, > > This seems like a good solution. I was concerned about the time taken up > reading one at a time. If a chuck can be read in each time, then that should > be the way for me to handle the problem. > > Thanks, > > Walt > > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > w...@dataanalyticscorp.com > www.dataanalyticscorp.com > > _____________________________________________________ > > On 8/15/2010 1:06 PM, jim holtman wrote: >> >> For efficiency of processing, look at reading in several >> hundred/thousand lines at a time. One line read/write will probably >> spend most of the time in the system calls to do the I/O and will take >> a long time. So do something like this: >> >> con<- file('yourInputFile', 'r') >> outfile<- file('yourOutputFile', 'w') >> while (length(input<- readLines(con, n=1000)> 0){ >> for (i in 1:length(input)){ >> ......your one line at a time processing >> } >> writeLines(output, con=outfile) >> } >> >> On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp. >> <w...@dataanalyticscorp.com> wrote: >> >>> >>> Hi, >>> >>> I have an upcoming project that will involve a large text file. I want >>> to >>> >>> 1. read the file into R one line at a time >>> 2. do some string manipulations on the line >>> 3. write the line to another text file. >>> >>> I can handle the last two parts. Scan and read.table seem to read the >>> whole >>> file in at once. Since this is a very large file (several hundred >>> thousand >>> lines), this is not practical. Hence the idea of reading one line at at >>> time. The question is, can R read one line at a time? If so, how? Any >>> suggestions are appreciated. >>> >>> Thanks, >>> >>> Walt >>> >>> ________________________ >>> >>> Walter R. Paczkowski, Ph.D. >>> Data Analytics Corp. >>> 44 Hamilton Lane >>> Plainsboro, NJ 08536 >>> ________________________ >>> (V) 609-936-8999 >>> (F) 609-936-3733 >>> w...@dataanalyticscorp.com >>> www.dataanalyticscorp.com >>> >>> _____________________________________________________ >>> >>> >>> -- >>> ________________________ >>> >>> Walter R. Paczkowski, Ph.D. >>> Data Analytics Corp. >>> 44 Hamilton Lane >>> Plainsboro, NJ 08536 >>> ________________________ >>> (V) 609-936-8999 >>> (F) 609-936-3733 >>> w...@dataanalyticscorp.com >>> www.dataanalyticscorp.com >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> > > > -- > ________________________ > > Walter R. Paczkowski, Ph.D. > Data Analytics Corp. > 44 Hamilton Lane > Plainsboro, NJ 08536 > ________________________ > (V) 609-936-8999 > (F) 609-936-3733 > w...@dataanalyticscorp.com > www.dataanalyticscorp.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.