On Apr 3, 2011, at 3:46 PM, Tyler Rinker wrote:

aThanks David,

After seeing the simplicity of your function versus the convoluted mess I worked up I now understand why it's not necessary to have a package to find NA's (and from what you said is a part of other packages such as Hmisc already).

I'm actually not aware that any of the `describe` variants will return the indices of NA's. In the case of real dataset such an object could be fairly large. It was the other descriptive functions that I said were probably already coded.


I am at the 2 1/2 month mark as an R user and have loads to learn. Simpler is better. Thanks David for your time and I will take the information you gave and put it to use in new situations.

You should also familiarize yourself with complete.cases() and the various functions that handle na.action parameters (linked from that help page). Note that complete.cases returns a logical vector (not the cases themselves) and is designed for indexing matrices or dataframes.


Tyler

> CC: r-help@r-project.org
> From: dwinsem...@comcast.net
> To: tyler_rin...@hotmail.com
> Subject: Re: [R] Function for finding NA's
> Date: Sun, 3 Apr 2011 14:19:40 -0400
>
>
> On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:
>
> >
> > Quick question,
> >
> > I tried to find a function in available packages to find NA's for an > > entire data set (or single variables) and report the row of missing
> > values (NA's for each column). I searched the typical routes
> > through the blogs and the help manuals for 15 minutes. Rather than
> > spend any more time searching I created my own function to do this
> > (probably in less time than it would have taken me to find the
> > function).
> >
> > Now I still have the same question: Is this function (NAhunter I
> > call it) already in existence? If so please direct me (because I'm
> > sure they've written better code more efficiently). I highly doubt
> > I'm this first person to want to find all the missing values in a
> > data set so I assume there is a function for it but I just didn't
> > spend enough time looking. If there is no existing function (big if
> > here), is this something people feel is worthwhile for me to put
> > into a package of some sort?
>
> I'm not sure that it would have occurred to people to include it in a
> package. Consider:
>
> getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
>
> > cities
> long lat city pop
> 1 -58.38194 -34.59972 Buenos Aires NA
> 2 14.25000 40.83333 <NA> NA
> > getNa(cities)
> $long
> integer(0)
>
> $lat
> integer(0)
>
> $city
> [1] 2
>
> $pop
> [1] 1 2
>
> There are several packages with functions by the name `describe` that
> do most or all of rest of what you have proposed. I happen to use
> Harrell's Hmisc but the other versions should also be reviewed if you
> want to avoid re-inventing the wheel.
> --
> David.
>
> >
> > Tyler
> >
> > Here's the code:
> >
> > NAhunter<-function(dataset)
> > {
> > find.NA<-function(variable)
> > {
> > if(is.numeric(variable)){
> > n<-length(variable)
> > mean<-mean(variable, na.rm=T)
> > median<-median(variable, na.rm=T)
> > sd<-sd(variable, na.rm=T)
> > NAs<-is.na(variable)
> > total.NA<-sum(NAs)
> > percent.missing<-total.NA/n
> > descriptives<- data.frame(n,mean,median,sd,total.NA,percent.missing)
> > rownames(descriptives)<-c(" ")
> > Case.Number<-1:n
> > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > missing.value<-data.frame(Case.Number,Missing.Values)
> > missing.values<-missing.value[ which(Missing.Values=='Missing
> > Value'),]
> > list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
> > MISSING VALUES"=missing.values[,1])
> > }
> > else{
> > n<-length(variable)
> > NAs<-is.na(variable)
> > total.NA<-sum(NAs)
> > percent.missing<-total.NA/n
> > descriptives<-data.frame(n,total.NA,percent.missing)
> > rownames(descriptives)<-c(" ")
> > Case.Number<-1:n
> > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
> > missing.value<-data.frame(Case.Number,Missing.Values)
> > missing.values<-missing.value[ which(Missing.Values=='Missing
> > Value'),]
> > list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
> > MISSING VALUES"=missing.values[,1])
> > }
> > }
> > dataset<-data.frame(dataset)
> > options(scipen=100)
> > options(digits=2)
> > lapply(dataset,find.NA)
> > }
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to