Re: [R] Function for deleting variables with >=50% missing obs from a data frame

Rita Carreira Mon, 18 Apr 2011 14:49:44 -0700

Thanks for the suggestion Daryl! I did have to include the exclamation point 
before mean, otherwise it selected the columns with the most missing 
observations. But it was really nice to see this flexibility in R. So my fix was
dfQ<- dfQtemp[ , sapply(dfQtemp, function(x) !mean(is.na(x))>.6)]
Thanks again!
Rita ===================================== "If you think education is 
expensive, try ignorance."--Derek Bok





> Date: Fri, 15 Apr 2011 15:08:29 -0700
> From: dar...@uw.edu
> To: ritacarre...@hotmail.com
> Subject: Re: [R] Function for deleting variables with >=50% missing obs from 
> a data frame
> 
> you could simply modify
> 
> !all(is.na(x))
> 
> to
> 
> mean(is.na(x))>  .6
> 
> or some such, or invert the logic if I have it backwards.
> 
> .6 was the fraction greater than which we omit the data.
> 
> 
> 
> 
> On 4/15/11 3:00 PM, Rita Carreira wrote:
> > Hello R users!
> > I have several data frames where some of the variables have many missing 
> > observations. For example, Q1 in one of my data frames has over 66% of its 
> > observations missing. I have tried imputation with mice but it does not 
> > work for all the data frames and I get the following message or a similar 
> > message to this:
> >   iter imp variable
> >    1   1  Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8  Q9  Q10  Q11  Q12  Q13  Q14  Q15  
> > Q19  Q36  Q47  Q52  Q79  Q80  Q94  Q97  Q104  Q108  Q122  Q131  Q134  P1  
> > P2  P3  P4  P5  P6Error in solve.default(xtx + diag(pen)) :
> >    system is computationally singular: reciprocal condition number = 
> > 1.83044e-16
> > In addition: Warning messages:
> > 1: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced
> > ...
> > 7: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced
> > Note: warnings 2 to 6 suppressed by me.
> > I would like to try a different approach where I delete the variables that 
> > have more than 50% missing observations from the data frame (well, the 
> > actual percentage might change). I have already deleted from the data frame 
> > the variables that were all missing and for this I used the following code, 
> > which was kindly suggested by one of you:
> > ## Data frame after removing any blank columns:dfQ<- dfQtemp[ , 
> > sapply(dfQtemp, function(x) !all(is.na(x)))]
> >   Any ideas or suggestons for deleting variables with partially missing 
> > data?
> > Thanks and have a great weekend!
> > Rita ===================================== "If you think education is 
> > expensive, try ignorance."--Derek Bok
> >
> >
> >                                     
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function for deleting variables with >=50% missing obs from a data frame

Reply via email to