Just to quote explicitly the passage I mentioned in the R Data document:
<QUOTE> Function `read.fwf' provides a simple way to read such files, specifying a vector of field widths. The function reads the file into memory as whole lines, splits the resulting character strings, writes out a temporary tab-separated file and then calls `read.table'. This is adequate for small files, but for anything more complicated we recommend using the facilities of a language like `perl' to pre-process the file. </QUOTE> Note particularly the final sentence. Ross On Sun, 2009-08-16 at 19:37 -0400, Wensui Liu wrote: > Gabor made a good point. > Here is an example I copied from my blog. > > ############################################## > # READ FIXED-WIDTH DATA FILE WITH read.fwf() # > # ------------------------------------------ # > # EQUIVALENT SAS CODE: # > # filename data 'E:\sas\fixed.txt'; # > # data test; # > # infile data truncover; # > # input @1 city $ 1 - 22 @23 population; # > # run; # > ############################################## > > # OPEN A CONNECTION TO THE DATA FILE > data <- file(description = "e:\\sas\\fixed.txt", open = "r") > > # width = c(...) ==> SPECIFIES COLUMN WIDTHS > # col.names = c(...) ==> GIVES COLUMN NAMES > # colClasses = c(...) ==> DEFINES COLUMN CLASSES > test <- read.fwf(data, header = FALSE, width = c(22, 10), > col.names = c("city", "population"), > colClasses = c("character", "numeric")) > > close(data) > > On Sun, Aug 16, 2009 at 6:36 PM, Gabor > Grothendieck<ggrothendi...@gmail.com> wrote: > > Check out ?read.fwf > > > > On Sun, Aug 16, 2009 at 4:49 PM, Ross Boylan<r...@biostat.ucsf.edu> > wrote: > >> Recorded here so others may avoid my mistakes. > >> > >> I have a bunch of files containing fixed width data. The R Data > guide > >> suggests that one pre-process them with a script if they are large. > >> They were 50MG and up, and I needed to process another file that > gave > >> the layout of the lines anyway. > >> > >> I tried rpy to not only preprocess but create the R data object in > one > >> go. It seemed like a good idea; it wasn't. The core operation, > was to > >> build up a string for each line that looked like > "data.frame(var1=val1, > >> var2=val2, [etc])" and then rbind this to the data.frame so far. I > did > >> this with r(mycommand string). Almost all the values were numeric. > >> > >> This was incredibly slow, being unable to complete after running > >> overnight. > >> > >> So, the lesson is, don't do that! > >> > >> I switched to preprocessing that created a csv file, and then > read.csv > >> from R. This worked in under a minute. The result had dimension > 150913 > >> x 129. > >> > >> The good news in rpy was that I found objects persisted across > calls to > >> the r object. > >> > >> Exactly why this was so slow I don't know. The two obvious > suspects the > >> speed of rbind, which I think is pretty inefficient, and the > overhead of > >> crossing the python/R boundary. > >> > >> This was on Debian Lenny: > >> python-rpy 1.0.3-2 > >> Python 2.5.2 > >> R 2.7.1 > >> > >> rpy2 is not available in Lenny, though it is in development > versions of > >> Debian. > >> > >> Ross Boylan > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > ============================== > WenSui Liu > Blog : statcompute.spaces.live.com > Tough Times Never Last. But Tough People Do. - Robert Schuller > ============================== > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.