Re: [R] good and bad ways to import fixed column data (rpy)

Wensui Liu Sun, 16 Aug 2009 16:39:37 -0700

Gabor made a good point.
Here is an example I copied from my blog.

##############################################
# READ FIXED-WIDTH DATA FILE WITH read.fwf() #
# ------------------------------------------ #
# EQUIVALENT SAS CODE:                       #
# filename data 'E:\sas\fixed.txt';          #
# data test;                                 #
#   infile data truncover;                   #
#   input @1 city $ 1 - 22 @23 population;   #
# run;                                       #
##############################################


# OPEN A CONNECTION TO THE DATA FILE
data *<-* file(description = "e:\\sas\\fixed.txt", open = "r")

# width = c(...)      ==> SPECIFIES COLUMN WIDTHS
# col.names = c(...)  ==> GIVES COLUMN NAMES
# colClasses = c(...) ==> DEFINES COLUMN CLASSES
test *<-* read.fwf(data, header = FALSE, width = c(22, 10),
                 col.names = c("city", "population"),
                 colClasses = c("character", "numeric"))

close(data)

On Sun, Aug 16, 2009 at 6:36 PM, Gabor Grothendieck<ggrothendi...@gmail.com>
wrote:
> Check out ?read.fwf
>
> On Sun, Aug 16, 2009 at 4:49 PM, Ross Boylan<r...@biostat.ucsf.edu> wrote:
>> Recorded here so others may avoid my mistakes.
>>
>> I have a bunch of files containing fixed width data.  The R Data guide
>> suggests that one pre-process them with a script if they are large.
>> They were 50MG and up, and I needed to process another file that gave
>> the layout of the lines anyway.
>>
>> I tried rpy to not only preprocess but create the R data object in one
>> go.  It seemed like a good idea; it wasn't.  The core operation, was to
>> build up a string for each line that looked like "data.frame(var1=val1,
>> var2=val2, [etc])" and then rbind this to the data.frame so far.  I did
>> this with r(mycommand string). Almost all the values were numeric.
>>
>> This was incredibly slow, being unable to complete after running
>> overnight.
>>
>> So, the lesson is, don't do that!
>>
>> I switched to preprocessing that created a csv file, and then read.csv
>> from R.  This worked in under a minute.  The result had dimension 150913
>> x 129.
>>
>> The good news in rpy was that I found objects persisted across calls to
>> the r object.
>>
>> Exactly why this was so slow I don't know.  The two obvious suspects the
>> speed of rbind, which I think is pretty inefficient, and the overhead of
>> crossing the python/R boundary.
>>
>> This was on Debian Lenny:
>> python-rpy                    1.0.3-2
>> Python 2.5.2
>> R 2.7.1
>>
>> rpy2 is not available in Lenny, though it is in development versions of
>> Debian.
>>
>> Ross Boylan
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
==============================
WenSui Liu
Blog   : statcompute.spaces.live.com
Tough Times Never Last. But Tough People Do.  - Robert Schuller
==============================

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] good and bad ways to import fixed column data (rpy)

Reply via email to