Re: [R] Function for finding NA's

David Winsemius Sun, 03 Apr 2011 11:21:16 -0700


On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:

Quick question,
I tried to find a function in available packages to find NA's for anentire data set (or single variables) and report the row of missingvalues (NA's for each column). I searched the typical routesthrough the blogs and the help manuals for 15 minutes. Rather thanspend any more time searching I created my own function to do this(probably in less time than it would have taken me to find thefunction).
Now I still have the same question: Is this function (NAhunter Icall it) already in existence? If so please direct me (because I'msure they've written better code more efficiently). I highly doubtI'm this first person to want to find all the missing values in adata set so I assume there is a function for it but I just didn'tspend enough time looking. If there is no existing function (big ifhere), is this something people feel is worthwhile for me to putinto a package of some sort?

I'm not sure that it would have occurred to people to include it in apackage. Consider:


getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )

> cities
       long       lat         city pop
1 -58.38194 -34.59972 Buenos Aires  NA
2  14.25000  40.83333         <NA>  NA
> getNa(cities)
$long
integer(0)

$lat
integer(0)

$city
[1] 2

$pop
[1] 1 2

There are several packages with functions by the name `describe` thatdo most or all of rest of what you have proposed. I happen to useHarrell's Hmisc but the other versions should also be reviewed if youwant to avoid re-inventing the wheel.

--
David.


Tyler

Here's the code:

NAhunter<-function(dataset)
{
find.NA<-function(variable)
{
if(is.numeric(variable)){
n<-length(variable)
mean<-mean(variable, na.rm=T)
median<-median(variable, na.rm=T)
sd<-sd(variable, na.rm=T)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)

missing.values<-missing.value[ which(Missing.Values=='MissingValue'),]list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OFMISSING VALUES"=missing.values[,1])

}
else{
n<-length(variable)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)

missing.values<-missing.value[ which(Missing.Values=='MissingValue'),]list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OFMISSING VALUES"=missing.values[,1])

}
}
dataset<-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}                                       
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function for finding NA's

Reply via email to