Dear list, I want to label outliers in a ggplot box plot with the name of the subject for which outlying data were observed.
I have proceeded by creating a simple function to identify outliers: is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) } And then the 'safe_ifelse' workaround to get 'ifelse' to function properly with factors. safe.ifelse <- function(cond, yes, no) { class.y <- class(yes) if (class.y == "factor") { levels.y = levels(yes) } X <- ifelse(cond,yes,no) if (class.y == "factor") { X = as.factor(X) levels(X) = levels.y } else { class(X) <- class.y } return(X) } >From here, I have ran data through a dplyr pipeline to produce the plot. **data at https://www.dropbox.com/s/2pcuuclxiqw1va1/data.csv?dl=0 library(dplyr) data<-subset(data,data$variable1!='NA') p1<- data %>% group_by(season,location) %>% mutate(outlier=safe.ifelse(is_outlier(variable1),subject,as.numeric(NA))) %>% ggplot(aes(x=factor(season),y=variable1))+ geom_boxplot()+ facet_wrap(~location,nrow=2)+ guides(fill=FALSE)+ geom_text(aes(label=outlier),na.rm=TRUE,hjust=1.5,size=2.5) While outliers are correctly identified, labelling does not work as it should. Rather than getting subject-specific outlier labels, three levels of the 'subject' factor are printed repeatedly and erroneously (and seemingly randomly). Labelling outliers by their numerical values (i.e. by changing 'subject' to 'variable1' in the 'safe_ifelse function) does not cause problems. I assume I am missing something obvious here - perhaps someone could kindly indicate where I am going wrong? Thanks, Andreas [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.