Dear list,

I want to label outliers in a ggplot box plot with the name of the subject for 
which outlying data were observed.


I have proceeded by creating a simple function to identify outliers:

is_outlier <- function(x) {

  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * 
IQR(x))

}


And then the 'safe_ifelse' workaround to get 'ifelse' to function properly with 
factors.

safe.ifelse <- function(cond, yes, no) {

  class.y <- class(yes)

  if (class.y == "factor") {

    levels.y = levels(yes)

  }

  X <- ifelse(cond,yes,no)

  if (class.y == "factor") {

    X = as.factor(X)

    levels(X) = levels.y

  } else {

    class(X) <- class.y

  }

  return(X)

}


>From here, I have ran data through a dplyr pipeline to produce the plot.

**data at https://www.dropbox.com/s/2pcuuclxiqw1va1/data.csv?dl=0


library(dplyr)
data<-subset(data,data$variable1!='NA')
p1<-
  data %>%
  group_by(season,location) %>%
  mutate(outlier=safe.ifelse(is_outlier(variable1),subject,as.numeric(NA))) %>%
  ggplot(aes(x=factor(season),y=variable1))+
  geom_boxplot()+
  facet_wrap(~location,nrow=2)+
  guides(fill=FALSE)+
  geom_text(aes(label=outlier),na.rm=TRUE,hjust=1.5,size=2.5)

While outliers are correctly identified, labelling does not work as it should. 
Rather than getting subject-specific outlier labels, three levels of the 
'subject' factor are printed repeatedly and erroneously (and seemingly 
randomly). Labelling outliers by their numerical values (i.e. by changing 
'subject' to 'variable1' in the 'safe_ifelse function) does not cause problems.

I assume I am missing something obvious here - perhaps someone could kindly 
indicate where I am going wrong?

Thanks,
Andreas

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to