Thanks for the suggestion Daryl! I did have to include the exclamation point before mean, otherwise it selected the columns with the most missing observations. But it was really nice to see this flexibility in R. So my fix was dfQ<- dfQtemp[ , sapply(dfQtemp, function(x) !mean(is.na(x))>.6)] Thanks again! Rita ===================================== "If you think education is expensive, try ignorance."--Derek Bok
> Date: Fri, 15 Apr 2011 15:08:29 -0700 > From: dar...@uw.edu > To: ritacarre...@hotmail.com > Subject: Re: [R] Function for deleting variables with >=50% missing obs from > a data frame > > you could simply modify > > !all(is.na(x)) > > to > > mean(is.na(x))> .6 > > or some such, or invert the logic if I have it backwards. > > .6 was the fraction greater than which we omit the data. > > > > > On 4/15/11 3:00 PM, Rita Carreira wrote: > > Hello R users! > > I have several data frames where some of the variables have many missing > > observations. For example, Q1 in one of my data frames has over 66% of its > > observations missing. I have tried imputation with mice but it does not > > work for all the data frames and I get the following message or a similar > > message to this: > > iter imp variable > > 1 1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 > > Q19 Q36 Q47 Q52 Q79 Q80 Q94 Q97 Q104 Q108 Q122 Q131 Q134 P1 > > P2 P3 P4 P5 P6Error in solve.default(xtx + diag(pen)) : > > system is computationally singular: reciprocal condition number = > > 1.83044e-16 > > In addition: Warning messages: > > 1: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced > > ... > > 7: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced > > Note: warnings 2 to 6 suppressed by me. > > I would like to try a different approach where I delete the variables that > > have more than 50% missing observations from the data frame (well, the > > actual percentage might change). I have already deleted from the data frame > > the variables that were all missing and for this I used the following code, > > which was kindly suggested by one of you: > > ## Data frame after removing any blank columns:dfQ<- dfQtemp[ , > > sapply(dfQtemp, function(x) !all(is.na(x)))] > > Any ideas or suggestons for deleting variables with partially missing > > data? > > Thanks and have a great weekend! > > Rita ===================================== "If you think education is > > expensive, try ignorance."--Derek Bok > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.