Re: [R] The behaviour of read.csv().

Duncan Murdoch Sun, 05 Dec 2010 06:03:14 -0800

On 03/12/2010 7:08 AM, Duncan Murdoch wrote:

On 02/12/2010 9:59 PM, Rolf Turner wrote:


On 3/12/2010, at 3:48 PM, David Scott wrote:

   On 03/12/10 14:33, Duncan Murdoch wrote:


        <SNIP>

I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
The comment in the NEWS file suggests it was in response to some strange
csv file coming out of Excel.

The real problem with the CSV format is that there really isn't a well
defined standard for it.  The first RFC about it was published in 2005,
and it doesn't claim to be authoritative.  Excel is kind of a standard,
but it does some very weird things.  (For example:  enter the string 01
into a field.  To keep the leading 0, you need to type it as '01.  Save
the file, read it back:  goodbye 0.  At least that's what a website I
was just on says about Excel, and what OpenOffice does.)

I've been burned so many times by storing data in .csv files, that I
just avoid them whenever I can.

Absolutely agree with this Duncan. Playing around with .csv files is
like playing with some sort of unstable explosive. I also avoid them as
much as possible.


Where I work, everybody but me uses (yeuuccchhh!!!) Excel or SPSS.  If
we are to share data sets, *.csv files seem to be the most efficacious,
if not the only, way to go.


I was going to suggest using DIF rather than CSV.  It contains more
internal information about the file (including the type of each entry),
but has the disadvantage of being less readable, even though it is ascii.

However, in putting together a little demo, I found a couple of bugs in
the R implementation of read.DIF, and it looks as though it ignores the
internal type information.  Sigh.

As of r53778, the bugs I noticed should be fixed. read.DIF now respectsthe internal type information, so it will keep character strings like"001" as type character (unless you ask it to change the type).


Duncan Murdoch


Duncan Murdoch


So far, we've had very few problems.  The one that started off this thread
is the only one I can think of that related to the *.csv format.

At least *.csv files have the virtue of being ASCII files, whence if things
go wrong it is at least possible to dig into them with a text editor and
figure out just what the problem is.

        cheers,

                Rolf


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The behaviour of read.csv().

Reply via email to