That seems to work for the toy data. How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions? Won't I be losing information (and creating a larger dataset) by not using the factor levels?
How do I recover the factor values? I read my datafile (read.spss using use.value.labels = FALSE,) and got this: connector Mode_orig_only 1 9 1 17.814338 0.000000 3 49.128982 0.000000 4 525.978899 0.000000 5 913.295370 0.000000 6 114.302764 0.000000 7 298.151438 0.000000 8 93.088049 0.000000 9 233.794168 0.000000 10 20.764539 0.000000 11 424.120506 0.000000 12 8.054528 0.000000 13 6.010790 0.000000 14 1832.748525 0.000000 15 10191.284139 0.000000 16 2099.771923 0.000000 17 1630.148576 0.000000 <NA> 0.000000 9491.013249 which does have the "NA" row, but not the factor labels. If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs. Can't I have both? The top summary will also omit all 0 value factors (of course) in the variable summarized. The same summary using factors: connector Mode_orig_only OD Passenger Connector Walked/Biked 17.814338 0.000000 I flew in from another a place/connected 0.000000 0.000000 Amtrak 49.128982 0.000000 Bus - Chartered bus or van 525.978899 0.000000 Bus - Hotel Courtesy van 913.295370 0.000000 Bus - MTA (Metro) or other public transit bus 114.302764 0.000000 Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000 Bus - Union Station Flyaway 93.088049 0.000000 Bus - Van Nuys Flyaway 233.794168 0.000000 Green line/light rail 20.764539 0.000000 Limousine/town car 424.120506 0.000000 Metrolink 8.054528 0.000000 Motorcycle 6.010790 0.000000 On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000 Car/truck/van - Private 10191.284139 0.000000 Car/truck/van - Rental 2099.771923 0.000000 Taxi 1630.148576 0.000000 ..Refused 0.000000 0.000000 Robert Farley Metro www.Metro.net -----Original Message----- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Thursday, May 28, 2009 16:26 To: Farley, Robert Subject: RE: [R] Still can't find missing data Try reading it in with read.table's argument stringsAsFactors=FALSE. I think the underlying problem is that exclude= is used only if the classifying variables are not already factors. I haven't studied the help file well enough to see if that is what is is documented to do, but it seems misleading. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Farley, Robert > Sent: Thursday, May 28, 2009 4:10 PM > To: R-help > Subject: Re: [R] Still can't find missing data > > In this toy data, each of the tables should sum to 1111 > None of the tables shows NA columns or rows. > > > > ################################ > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE, > sep=",", na.strings="NA", dec=".", row.names="ID_Num") > > ToyData > Data1 Data2 Data3 Weight > 101 Sam Red Banana 1 > 102 Sam Green Banana 2 > 103 Sam Blue Orange 2 > 104 Fred Red Orange 2 > 105 Fred Green Guava 2 > 106 Fred Blue Guava 2 > 107 <NA> Red Pear 50 > 108 <NA> Green Pear 50 > 109 <NA> Blue <NA> 1000 > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > na.action=na.pass, ToyData) > Data2 > Data1 Blue Green Red > Fred 2 2 2 > Sam 2 2 1 > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > Data2 > Data1 Blue Green Red > Fred 2 2 2 > Sam 2 2 1 > > xtabs(Weight ~ Data1 + Data3, exclude=NULL, > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > Data3 > Data1 Banana Guava Orange Pear > Fred 0 4 2 0 > Sam 3 0 2 0 > > > > > > > > Robert Farley > Metro > www.Metro.net > > > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Dieter Menne > Sent: Thursday, May 28, 2009 05:46 > To: r-help@r-project.org > Subject: Re: [R] Still can't find missing data > > > > > Farley, Robert wrote: > > > > I can't get the syntax that will allow me to show NA values > (rows) in the > > xtabs. > > > > lengthy non-reproducible example removed > > > > If you want a reproducible answer, prepare a reproducible > result. And check > that the > syntax is > > na.action=na.pass > > Dieter > > > > > -- > View this message in context: > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306 > 27p23761006.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.