If the headers all start with the same letter, "A" say, and the data only contain numbers on their lines then just use
read.table(..., comment = "A") On Mon, Nov 2, 2009 at 2:03 PM, Gene Leynes <gleyne...@gmail.com> wrote: > I've been trying to figure out how to read in a large file for a few days > now, and after extensive research I'm still not sure what to do. > > I have a large comma delimited text file that contains 59 fields in each > record. > There is also a header every 121 records > > This function works well for smallish records > getcsv=function(fname){ > ff=file(description = fname) > x <- readLines(ff) > closeAllConnections() > x <- x[x != ""] # REMOVE BLANKS > x=x[grep("^[-0-9]", x)] # REMOVE ALL TEXT > > spl=strsplit(x,',') # THIS PART IS SLOW, BUT MANAGABLE > > xx=t(sapply(1:length(spl),function(temp)as.vector(na.omit(as.numeric(spl[[temp]]))))) > return(xx) > } > It's not elegant, but it works. > For 121,000 records it completes in 2.3 seconds > For 121,000*5 records it completes in 63 seconds > For 121,000*10 records it doesn't complete > > When I try other methods to read the file in chunks (using scan), the > process breaks down because I have to start at the beginning of the file on > every iteration. > For example: > fnn=function(n,col){ > a=122*(n-1)+2 > xx=scan(fname,skip=a-1,nlines=121,sep=',',quiet=TRUE,what=character(0)) > xx=xx[xx!=''] > xx=matrix(xx,ncol=49,byrow=TRUE) > xx[,col] > } > system.time(sapply(1:10,fnn,c=26)) # 0.31 Seconds > system.time(sapply(91:90,fnn,c=26)) # 1.09 Seconds > system.time(sapply(901:910,fnn,c=26)) # 5.78 Seconds > > Even though I'm only getting the 26th column for 10 sets of records, it > takes a lot longer the further into the file I go. > > How can I tell scan to pick up where it left off, without it starting at the > beginning?? There must be a good example somewhere. > > I have done a lot of research (in fact, thank you to Michael J. Crawley and > others for your help thus far) > > Thanks, > > Gene > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.