aThanks David, After seeing the simplicity of your function versus the convoluted mess I worked up I now understand why it's not necessary to have a package to find NA's (and from what you said is a part of other packages such as Hmisc already). I am at the 2 1/2 month mark as an R user and have loads to learn. Simpler is better. Thanks David for your time and I will take the information you gave and put it to use in new situations. Tyler > CC: r-help@r-project.org > From: dwinsem...@comcast.net > To: tyler_rin...@hotmail.com > Subject: Re: [R] Function for finding NA's > Date: Sun, 3 Apr 2011 14:19:40 -0400 > > > On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote: > > > > > Quick question, > > > > I tried to find a function in available packages to find NA's for an > > entire data set (or single variables) and report the row of missing > > values (NA's for each column). I searched the typical routes > > through the blogs and the help manuals for 15 minutes. Rather than > > spend any more time searching I created my own function to do this > > (probably in less time than it would have taken me to find the > > function). > > > > Now I still have the same question: Is this function (NAhunter I > > call it) already in existence? If so please direct me (because I'm > > sure they've written better code more efficiently). I highly doubt > > I'm this first person to want to find all the missing values in a > > data set so I assume there is a function for it but I just didn't > > spend enough time looking. If there is no existing function (big if > > here), is this something people feel is worthwhile for me to put > > into a package of some sort? > > I'm not sure that it would have occurred to people to include it in a > package. Consider: > > getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) ) > > > cities > long lat city pop > 1 -58.38194 -34.59972 Buenos Aires NA > 2 14.25000 40.83333 <NA> NA > > getNa(cities) > $long > integer(0) > > $lat > integer(0) > > $city > [1] 2 > > $pop > [1] 1 2 > > There are several packages with functions by the name `describe` that > do most or all of rest of what you have proposed. I happen to use > Harrell's Hmisc but the other versions should also be reviewed if you > want to avoid re-inventing the wheel. > -- > David. > > > > > Tyler > > > > Here's the code: > > > > NAhunter<-function(dataset) > > { > > find.NA<-function(variable) > > { > > if(is.numeric(variable)){ > > n<-length(variable) > > mean<-mean(variable, na.rm=T) > > median<-median(variable, na.rm=T) > > sd<-sd(variable, na.rm=T) > > NAs<-is.na(variable) > > total.NA<-sum(NAs) > > percent.missing<-total.NA/n > > descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing) > > rownames(descriptives)<-c(" ") > > Case.Number<-1:n > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ") > > missing.value<-data.frame(Case.Number,Missing.Values) > > missing.values<-missing.value[ which(Missing.Values=='Missing > > Value'),] > > list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF > > MISSING VALUES"=missing.values[,1]) > > } > > else{ > > n<-length(variable) > > NAs<-is.na(variable) > > total.NA<-sum(NAs) > > percent.missing<-total.NA/n > > descriptives<-data.frame(n,total.NA,percent.missing) > > rownames(descriptives)<-c(" ") > > Case.Number<-1:n > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ") > > missing.value<-data.frame(Case.Number,Missing.Values) > > missing.values<-missing.value[ which(Missing.Values=='Missing > > Value'),] > > list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF > > MISSING VALUES"=missing.values[,1]) > > } > > } > > dataset<-data.frame(dataset) > > options(scipen=100) > > options(digits=2) > > lapply(dataset,find.NA) > > } > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.