Thanks for your reply! This is what I was looking for! I'm using nas1 <- apply(data_matrix,1,function(x)sum(is.na(x))/nrow(data_matrix)) nas2 <- apply(data_matrix,2,function(x)sum(is.na(x))/ncol(data_matrix))
The thing about "significantly more" isn't really a helpful as I look at the data now. I better write a function that removes the row or column with the highest fraction of NAs, which I'll repeat as many times as it takes to get useful data. For example, I want to do heatmaps and dendrograms, but the data has too many NA values, so I get "Error in hclustfun(distfun(x)) : NA/NaN/Inf in foreign function call (arg 11)" David Winsemius wrote: > > > On Jul 4, 2009, at 9:22 PM, nyk wrote: > >> >> I have a data matrix containing quite a lot of missing values (NA). >> I know >> how to remove all column or rows containing NA values, but is there >> a some >> standard method for removing not all NA containing rows/column, but >> only >> those which have significantly more NAs than others? > > You have not defined what you mean by "significantly more than the > others" so perhaps all you want to know is haw to count the NA's in a > vector: > > > x=c(1,2,3,NA, 5,6,NA) > > sum(is.na(x)) > [1] 2 >> > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/NA-values-trimming-tp24339399p24347436.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.