Farley, Robert wrote:
Let's see if I understand this. Do I iterate through
x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
for each of the few hundred variables (x) in my data frame?
Yes, for all being factors.
Best,
Uwe Ligges
I tried to do this all at once and failed:
ToyData
Data1 Data2 Data3 Weight
101 Sam Red Banana 1.1
102 Sam Green Banana 2.1
103 Sam Blue Orange 2.1
104 Fred Red Orange 2.1
105 Fred Green Guava 2.1
106 Fred Blue Guava 2.1
107 <NA> Red Pear 50.1
108 <NA> Green Pear 50.1
109 <NA> Blue <NA> 1000.2
ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL,
na.action=na.pass))
Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) :
unused argument(s) (exclude = NULL, na.action = function (object, ...)
ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
ToyData
Data1 Data2 Data3 Weight
<NA> <NA> <NA> <NA>
Levels:
But it didn't work. Don't I need to do this separately for each variable?
Is there a way to get read.spss to insert "NA" levels for each variable when I create the data
frame? Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does
not?
Will this be a problem with read.dta as well?
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Thursday, May 28, 2009 20:39
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
In R factors don't save space over character vectors - only
one copy of any given string is kept in memory in either case.
Factors do let you order the levels in the way you want and
that is often important in presentations.
You can add NA to the list of levels of a factor by doing
x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
where 'x' represents each factor in your dataset. After
doing that is.na(x) will be all FALSE and you may not
want that for other situations.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Farley, Robert
Sent: Thursday, May 28, 2009 5:27 PM
To: R-help
Subject: Re: [R] Still can't find missing data
That seems to work for the toy data. How do I implement this
change with my real data, which are read from very large
Stata and SPSS files and keep the factor definitions? Won't
I be losing information (and creating a larger dataset) by
not using the factor levels?
How do I recover the factor values? I read my datafile
(read.spss using use.value.labels = FALSE,) and got this:
connector
Mode_orig_only 1 9
1 17.814338 0.000000
3 49.128982 0.000000
4 525.978899 0.000000
5 913.295370 0.000000
6 114.302764 0.000000
7 298.151438 0.000000
8 93.088049 0.000000
9 233.794168 0.000000
10 20.764539 0.000000
11 424.120506 0.000000
12 8.054528 0.000000
13 6.010790 0.000000
14 1832.748525 0.000000
15 10191.284139 0.000000
16 2099.771923 0.000000
17 1630.148576 0.000000
<NA> 0.000000 9491.013249
which does have the "NA" row, but not the factor labels. If
I read the file with use.value.labels=TRUE I can see what I'm
summarizing, but not the NAs. Can't I have both?
The top summary will also omit all 0 value factors (of
course) in the variable summarized.
The same summary using factors:
connector
Mode_orig_only
OD Passenger Connector
Walked/Biked
17.814338 0.000000
I flew in from another a place/connected
0.000000 0.000000
Amtrak
49.128982 0.000000
Bus - Chartered bus or van
525.978899 0.000000
Bus - Hotel Courtesy van
913.295370 0.000000
Bus - MTA (Metro) or other public transit bus
114.302764 0.000000
Bus - Scheduled airport bus or van (e.g. Airport bus or
Disn 298.151438 0.000000
Bus - Union Station Flyaway
93.088049 0.000000
Bus - Van Nuys Flyaway
233.794168 0.000000
Green line/light rail
20.764539 0.000000
Limousine/town car
424.120506 0.000000
Metrolink
8.054528 0.000000
Motorcycle
6.010790 0.000000
On-call shuttle/van (e.g. Super Shuttle, Prime Time)
1832.748525 0.000000
Car/truck/van - Private
10191.284139 0.000000
Car/truck/van - Rental
2099.771923 0.000000
Taxi
1630.148576 0.000000
..Refused
0.000000 0.000000
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
Try reading it in with read.table's argument stringsAsFactors=FALSE.
I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors. I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Farley, Robert
Sent: Thursday, May 28, 2009 4:10 PM
To: R-help
Subject: Re: [R] Still can't find missing data
In this toy data, each of the tables should sum to 1111
None of the tables shows NA columns or rows.
################################
ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", row.names="ID_Num")
ToyData
Data1 Data2 Data3 Weight
101 Sam Red Banana 1
102 Sam Green Banana 2
103 Sam Blue Orange 2
104 Fred Red Orange 2
105 Fred Green Guava 2
106 Fred Blue Guava 2
107 <NA> Red Pear 50
108 <NA> Green Pear 50
109 <NA> Blue <NA> 1000
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data3, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data3
Data1 Banana Guava Orange Pear
Fred 0 4 2 0
Sam 3 0 2 0
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Dieter Menne
Sent: Thursday, May 28, 2009 05:46
To: r-help@r-project.org
Subject: Re: [R] Still can't find missing data
Farley, Robert wrote:
I can't get the syntax that will allow me to show NA values
(rows) in the
xtabs.
lengthy non-reproducible example removed
If you want a reproducible answer, prepare a reproducible
result. And check
that the
syntax is
na.action=na.pass
Dieter
--
View this message in context:
http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
27p23761006.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.