aThanks David,
 
After seeing the simplicity of your function versus the convoluted mess I 
worked up I now understand why it's not necessary to have a package to find 
NA's (and from what you said is a part of other packages such as Hmisc 
already).  
 
I am at the 2 1/2 month mark as an R user and have loads to learn.  Simpler is 
better.  Thanks David for your time and I will take the information you gave 
and put it to use in new situations.
 
Tyler
 
> CC: r-help@r-project.org
> From: dwinsem...@comcast.net
> To: tyler_rin...@hotmail.com
> Subject: Re: [R] Function for finding NA's
> Date: Sun, 3 Apr 2011 14:19:40 -0400
> 
> 
> On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:
> 
> >
> > Quick question,
> >
> > I tried to find a function in available packages to find NA's for an 
> > entire data set (or single variables) and report the row of missing 
> > values (NA's for each column). I searched the typical routes 
> > through the blogs and the help manuals for 15 minutes. Rather than 
> > spend any more time searching I created my own function to do this 
> > (probably in less time than it would have taken me to find the 
> > function).
> >
> > Now I still have the same question: Is this function (NAhunter I 
> > call it) already in existence? If so please direct me (because I'm 
> > sure they've written better code more efficiently). I highly doubt 
> > I'm this first person to want to find all the missing values in a 
> > data set so I assume there is a function for it but I just didn't 
> > spend enough time looking. If there is no existing function (big if 
> > here), is this something people feel is worthwhile for me to put 
> > into a package of some sort?
> 
> I'm not sure that it would have occurred to people to include it in a 
> package. Consider:
> 
> getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
> 
> > cities
> long lat city pop
> 1 -58.38194 -34.59972 Buenos Aires NA
> 2 14.25000 40.83333 <NA> NA
> > getNa(cities)
> $long
> integer(0)
> 
> $lat
> integer(0)
> 
> $city
> [1] 2
> 
> $pop
> [1] 1 2
> 
> There are several packages with functions by the name `describe` that 
> do most or all of rest of what you have proposed. I happen to use 
> Harrell's Hmisc but the other versions should also be reviewed if you 
> want to avoid re-inventing the wheel.
> -- 
> David.
> 
> >
> > Tyler
> >
> > Here's the code:
> >
> > NAhunter<-function(dataset)
> > {
> > find.NA<-function(variable)
> > {
> > if(is.numeric(variable)){
> > n<-length(variable)
> > mean<-mean(variable, na.rm=T)
> > median<-median(variable, na.rm=T)
> > sd<-sd(variable, na.rm=T)
> > NAs<-is.na(variable)
> > total.NA<-sum(NAs)
> > percent.missing<-total.NA/n
> > descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)
> > rownames(descriptives)<-c(" ")
> > Case.Number<-1:n
> > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > missing.value<-data.frame(Case.Number,Missing.Values)
> > missing.values<-missing.value[ which(Missing.Values=='Missing 
> > Value'),]
> > list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF 
> > MISSING VALUES"=missing.values[,1])
> > }
> > else{
> > n<-length(variable)
> > NAs<-is.na(variable)
> > total.NA<-sum(NAs)
> > percent.missing<-total.NA/n
> > descriptives<-data.frame(n,total.NA,percent.missing)
> > rownames(descriptives)<-c(" ")
> > Case.Number<-1:n
> > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > missing.value<-data.frame(Case.Number,Missing.Values)
> > missing.values<-missing.value[ which(Missing.Values=='Missing 
> > Value'),]
> > list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF 
> > MISSING VALUES"=missing.values[,1])
> > }
> > }
> > dataset<-data.frame(dataset)
> > options(scipen=100)
> > options(digits=2)
> > lapply(dataset,find.NA)
> > } 
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to