Re: [Rd] Incorrect Import by Data for CSV File

peter dalgaard Mon, 25 Sep 2017 09:04:40 -0700

> On 25 Sep 2017, at 14:27 , Prof Brian Ripley <rip...@stats.ox.ac.uk> wrote:
> 
> On 25/09/2017 08:00, Dario Strbenac wrote:
>> Good day,
>> The data function can import a variety of file formats, one of them being 
>> C.S.V. 
> 
> That isn't its documented purpose.  It was the original way for packages to 
> provide datasets as needed (before lazy data was added).
> 
> Problematically, all of the table columns are collapsed into a single data 
> frame column. This occurs because "files ending .csv or .CSV are read using 
> read.table(..., header = TRUE, sep = ";", as.is=FALSE)". I suggest that the 
> semi-colon used as the column separator be changed to a comma.
> 
> We suggest you read the documentation ... the (non-English-locales) version 
> with a semicolon separator is one of four documented formats, and the 
> English-language one is not.  Even if it were desirable it would not be 
> possible to make a backwards-incompatible change after almost 20 years.
> 
> It really isn't clear why anyone would want to use anything other than the 
> second option (.rda) for data() unless other manipulations are needed (e.g. 
> to attach a package).  But that option was not part of the original 
> implementation.
>


It can be handy to have raw ascii data included in a package for people to see, 
but then you can use the .R mechanism to read the data. It is done for a couple 
of cases in the ISwR package, see e.g. the stroke.R and stroke.csv pair. This 
also allows you to fix up other things that you have no chcance of specifying 
directly in the file:

stroke <-  read.csv2("stroke.csv", na.strings=".")
names(stroke) <- tolower(names(stroke))
stroke <-  within(stroke,{
    sex <- factor(sex,levels=0:1,labels=c("Female","Male"))
    dgn <- factor(dgn)
    coma <- factor(coma, levels=0:1, labels=c("No","Yes"))
    minf <- factor(minf, levels=0:1, labels=c("No","Yes"))
    diab <- factor(diab, levels=0:1, labels=c("No","Yes"))
    han <- factor(han, levels=0:1, labels=c("No","Yes"))
    died <- as.Date(died, format="%d.%m.%Y")
    end <- pmin(died, as.Date("1996-01-01"), na.rm=TRUE)
    dstr <- as.Date(dstr,format="%d.%m.%Y")
    obsmonths <- as.numeric(end-dstr, "days")/30.6
    obsmonths[obsmonths==0] <- 0.1
    dead <- !is.na(died) & died < as.Date("1996-01-01")
    died[!dead] <- NA
    rm(end)
})


-pd


> -- 
> Brian D. Ripley,                  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Incorrect Import by Data for CSV File

Reply via email to