Gabor made a good point. Here is an example I copied from my blog. ############################################## # READ FIXED-WIDTH DATA FILE WITH read.fwf() # # ------------------------------------------ # # EQUIVALENT SAS CODE: # # filename data 'E:\sas\fixed.txt'; # # data test; # # infile data truncover; # # input @1 city $ 1 - 22 @23 population; # # run; # ##############################################
# OPEN A CONNECTION TO THE DATA FILE data *<-* file(description = "e:\\sas\\fixed.txt", open = "r") # width = c(...) ==> SPECIFIES COLUMN WIDTHS # col.names = c(...) ==> GIVES COLUMN NAMES # colClasses = c(...) ==> DEFINES COLUMN CLASSES test *<-* read.fwf(data, header = FALSE, width = c(22, 10), col.names = c("city", "population"), colClasses = c("character", "numeric")) close(data) On Sun, Aug 16, 2009 at 6:36 PM, Gabor Grothendieck<ggrothendi...@gmail.com> wrote: > Check out ?read.fwf > > On Sun, Aug 16, 2009 at 4:49 PM, Ross Boylan<r...@biostat.ucsf.edu> wrote: >> Recorded here so others may avoid my mistakes. >> >> I have a bunch of files containing fixed width data. The R Data guide >> suggests that one pre-process them with a script if they are large. >> They were 50MG and up, and I needed to process another file that gave >> the layout of the lines anyway. >> >> I tried rpy to not only preprocess but create the R data object in one >> go. It seemed like a good idea; it wasn't. The core operation, was to >> build up a string for each line that looked like "data.frame(var1=val1, >> var2=val2, [etc])" and then rbind this to the data.frame so far. I did >> this with r(mycommand string). Almost all the values were numeric. >> >> This was incredibly slow, being unable to complete after running >> overnight. >> >> So, the lesson is, don't do that! >> >> I switched to preprocessing that created a csv file, and then read.csv >> from R. This worked in under a minute. The result had dimension 150913 >> x 129. >> >> The good news in rpy was that I found objects persisted across calls to >> the r object. >> >> Exactly why this was so slow I don't know. The two obvious suspects the >> speed of rbind, which I think is pretty inefficient, and the overhead of >> crossing the python/R boundary. >> >> This was on Debian Lenny: >> python-rpy 1.0.3-2 >> Python 2.5.2 >> R 2.7.1 >> >> rpy2 is not available in Lenny, though it is in development versions of >> Debian. >> >> Ross Boylan >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- ============================== WenSui Liu Blog : statcompute.spaces.live.com Tough Times Never Last. But Tough People Do. - Robert Schuller ============================== [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.