On Oct 5, 2010, at 8:41 PM, james hirschorn wrote:
Yes, your solution of setting quote="" would read the multi-word
strings
incorrectly. A more complicated version of your solution should
work: First
check which columns are identified as strings, and then apply your
solution to
the remaining columns.
I'm a newbie at R, but it seems to me that there is a "logical
inconsistency" in
R: write.table puts quotes around numbers when they form a column of
factors,
but does not put quotes for a column of integers.
Factors are internally represented as positive integers, but have a
separate "layer" of their levels and labels. What I suspect you are
seeing and calling "numbers" are the character-valued labels.
> write.table(data.frame(nums=-1:-5, facs= factor(-1:-5)), file="",
row.names=F)
"nums" "facs"
-1 "-1"
-2 "-2"
-3 "-3"
-4 "-4"
-5 "-5"
That does not seem at all "logically inconsistent" to me.
--
David.
Since read.table is the "dual"
of write.table it seems that it should treat quoted and unquoted
columns
differently, analogously to write.table. However, there does not
even seem to be
an option to make read.table behave analogously.
----- Original Message ----
From: peter dalgaard <pda...@gmail.com>
To: james hirschorn <j_hirsch...@yahoo.com>
Cc: r-help@r-project.org
Sent: Tue, October 5, 2010 7:25:52 AM
Subject: Re: [R] read columns of quoted numbers as factors
On Oct 4, 2010, at 18:39 , james hirschorn wrote:
Suppose I have a data file (possibly with a huge number of
columns), where the
columns with factors are coded as "1", "2", "3", etc ... The
default behavior
of
read.table is to convert these columns to integer vectors.
Is there a way to get read.table to recognize that columns of
quoted numbers
represent factors (while unquoted numbers are interpreted as
integers), without
explicitly setting them with colClasses ?
I don't think there's a simple way, because the modus operandi of
read.table is
to read everything as character and then see whether it can be
converted to
numeric, and at that point any quotes will have been lost.
One possibility, somewhat dependent on the exact file format, would
be to
temporarily set quote="", see which columns contains quote
characters, and, on a
second pass, read those columns as factors, using a computed
colClasses
argument. It will break down if you have space-separated columns
with quoted
multi-word strings, though.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: pda...@gmail.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.