Dear All, I have got the limits for removing extreme values for each variables using following function .
f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm = TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)} #Example: n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- cbind(x1,x2,x3,x4,x5,x6) data2 <- data.frame(data1) xyz <- lapply(data1, f) #Now, I can eliminate those rows(observations) from the data which contains extreme values for each of the variables one by one as below. data2 <- subset (data2, x1<=xyz$x1[,1] & x1>=xyz$x1[,2]) data2 <- subset (data2, x1<=xyz$x2[,1] & x1>=xyz$x2[,2]) . . and so on.. But my data has more number of variables (more than 120), can any body suggest efficient way of eliminating rows containg extreme values? Thanks in advance! Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3927941.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.