On Dec 2, 2010, at 9:33 PM, Duncan Murdoch wrote:
On 02/12/2010 9:18 PM, David Winsemius wrote:
On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote:
snipped
I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
The comment in the NEWS file suggests it was in response to some
strange csv file coming out of Excel.
The real problem with the CSV format is that there really isn't a
well defined standard for it. The first RFC about it was published
in 2005, and it doesn't claim to be authoritative. Excel is kind of
a standard, but it does some very weird things. (For example:
enter the string 01 into a field. To keep the leading 0, you need
to type it as '01. Save the file, read it back: goodbye 0. At
least that's what a website I was just on says about Excel, and what
OpenOffice does.)
In both Excel and in OO,org you can select a column (or any other
range) and set its format to text. (The default is numeric, not that
different that read.table()'s default behavior.) Once a format has
been set, you then do not need leading quotes. I just created a small
example with OO.org Calc entered leading "0" without leading quotes
and this code runs as desired after copying the three cells to the
clipboard:
> read.table(pipe("pbpaste"), colClasses="character")
V1
1 01
2 004
3 0005
The same applies to date field in both OO.org and Excel. In this
regard, it is simply a matter of understanding what is the defined
behavior of your software and how one can manipulate it. This is no
different than learning R's classes, coercing them to your ends, and
dealing with other formatting issues.
You're right, I shouldn't have picked on Excel particularly here,
but it really is a bizarre format that says the default way to read
a file containing
"V1" # minor quibble. The V1 was added by read.table()
"01"
"004"
"0005"
is to assume that the column contains numeric values.
I'm a bit puzzled. Or maybe not. If you are criticizing the default
behavior of R's read.table then I do understand (but have been taught
by my reading of the FM that "numeric" happens iff all first <n> _are_
coercible to "numeric" without NA generation is what one should
expect). Excel is offering text exactly in the instances it has been
told that the cell format is "text".
(Yes, read.csv() makes this same assumption.) My main complaint is
with the format.
Meaning the defaults chosen for read.csv()?
--
David.
Duncan Murdoch
I've been burned so many times by storing data in .csv files, that I
just avoid them whenever I can.
No argument there. I know one physician whose weapon of choice is
Stata who always uses "|" as his separator, but that's perhaps
because
he works entirely in Windows. I imagine that might not be the most
uncommon character in *NIXen.
--
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.