I totally agree with Barry, although it's sometimes convenient to 
include data with analysis code for debugging and/or documentation purposes.

However, the example actually applies equally to separate data files. In 
fact, the example is from the U.S. Bureau of Labor Statistics at 
ftp://ftp.bls.gov/pub/time.series/sm/, which contains nothing but data 
and documentation files. At issue is not where the data come from, but 
rather how to parse relatively complex data organized inconsistently. 
SAS has built-in the ability to parse five different organizations of 
data: list (delimited), modified list, column, formatted, and mixed (see 
http://www.masil.org/sas/input.html). It seems R can parse such data, 
but only with considerable work by the user. It would be great to have a 
function/package that implements something with as easy (hah!) and 
flexible as SAS.

    Marsh

Barry Rowlingson wrote:
> On Mon, Dec 7, 2009 at 3:53 PM, Marshall Feldman <ma...@uri.edu> wrote:
>   
>> Regarding the various methods people have suggested, what if a typical
>> tab-delimited data line looks like:
>>
>>     SMS11000000000000001 1990 M01 688.0
>>
>> and the SAS INPUT statement is
>>
>>   INPUT survey $ 1-2 seasonal $ 3 state $ 4-5 area $ 6-10 supersector $
>> 11-12 @13 industry $8. datatype $ 21-22  year period $ value footnote $ ;
>>
>> Note that most data lines have no footnote item, as in the sample.
>>
>> Here (I think) we'd want all the character variables to be read as factors,
>> possibly "year" as a date, and "value" as numeric.
>>     
>
>  Actually I'm surprised that nobody has yet said what a clearly
> bonkers thing it is to mix up your data and your analysis code in a
> single file. Now suppose you have another set of data you want to
> analyse with the same code? Are you going to create a new file and
> paste the new data in? You've now got two copies of your analysis code
> - good luck keeping corrections to that code synchronised.
>
>  This just seems like horrendously bad practice, which is one reason
> it's kludgy in R. If it was good practice, someone would surely have
> written a way to do it neatly.
>
>  Keep your data in data files, and your functions in .R function
> files. You'll thank me later.
>
> Barry
>   


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to