On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote:
Dear all,
i have noticed some strange behaviour in the „jitter“ function in R.
On the help page for jitter it is stated that
"The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is
the amount argument (if specified).“
and
"If amount is NULL (default), we set a <- factor * d/5 where d is the smallest
difference between adjacent unique (apart from fuzz) x values.“
This works fine as long as there is no (very) large outlier
jitter(c(1,2,10^4)) # desired behaviour
[1] 1.083243 1.851571 9999.942716
But for very large outliers the added noise suddenly ‚jumps‘ to a much larger
scale:
jitter(c(1,2,10^5)) # bad behaviour
[1] -19535.649 9578.702 115693.854
# Noise should be of order (2-1)/5 = 0.2 but is of much larger order.
This probably does not matter much when jitter is used for plotting, but it can
cause problems when jitter is used to break ties.
I think this is kind of documented: "apart from fuzz" is what counts.
If you look at the code for jitter, you'll see this important line:
d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
By the time you get here, z is the length of the rante of the data, so
it's 99999 in your example. The rounding changes your values to
0,0,1e5, so the smallest difference is 1e5.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.