Folks: Please note: There is *no* way to "jitter" the 3 values 1,2, and 1e5 so that:
a) the jittered values differ from the original ones by a fraction of their original value; b) the plotting symbols for the jittered values will be distinguishable on a linear scale holding all 3 values. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 24, 2020 at 8:39 AM Martin Keller-Ressel < martin.keller-res...@tu-dresden.de> wrote: > Dear Duncan, Dear Rui, > > thanks for the responses and for pointing out that it is the ‚fuzz‘ part > that is causing the problem. I agree that this is not a bug, but could be > undesirable/surprising behaviour, since it causes a large ‚discontinuity‘ > in the jitter functions output depending on the input data. > > I was (ab?)using the jitter function to break ties, where the desired > behaviour would be to add noise just small enough to make all values > unique. (Such a function can easily be hand coded of course.) > > best regards, > Martin > > Am 23.09.2020 um 22:25 schrieb Duncan Murdoch <murdoch.dun...@gmail.com > <mailto:murdoch.dun...@gmail.com>>: > > On 23/09/2020 4:03 p.m., Rui Barradas wrote: > Hello, > I believe that though Duncan's explanation is right it is also not > explaining the value of the digits argument. round makes the first 2 > numbers 0 but why? > > If there had been rounding in their computation, you might see a > difference like 1e-15. You wouldn't want to use that for the scale of > jittering, so some rounding is needed. > > I think the documentation for the function is poor, but the intention was > probably to use the function in graphics (as the references did), and in > that case, any values too close together should be treated as equal and > jittering should separate them. The particular computation used says that > if the range is in [1, 10), values equal to 3 decimal places will be too > close and need separation. > > So I don't think this is a bug, but it might be a valid wishlist item: > document what "apart from fuzz" means, and perhaps allow it to be > controlled by the user. > > Duncan Murdoch > > > > The function below prints the digits argument and > then outputs d. The code is taken from jitter. > f <- function(x){ > z <- diff(r <- range(x[is.finite(x)])) > cat("digits:", 3 - floor(log10(z)), "\n") > diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z)))))) > } > Now see what cat outputs for 'digits'. > f(c(1,2,10^4)) # desired behaviour > #digits: 0 > #[1] 1 9998 > f(c(0,1,10^4)) # bad behaviour > #digits: -1 > #[1] 10000 > f(c(-1,0,10^4)) # bad behaviour > #digits: -1 > #[1] 10000 > f(c(1,2,10^5)) # bad behaviour > #digits: -1 > #[1] 1e+05 > And according to the documentation of ?round, negative digits are allowed: > Rounding to a negative number of digits means rounding to a power of > ten, so for example round(x, digits = -2) rounds to the nearest hundred. > But in this case two of the numbers are closer to 0 than they are of 10. > And unique keeps only 0 and the largest, then diff is big. > round(c(1,2,10^4),0) # desired behaviour > #[1] 1 2 10000 > round(c(0,1,10^4),-1) # bad behaviour > #[1] 0 0 10000 > round(c(-1,0,10^4),-1) # bad behaviour > #[1] 0 0 10000 > round(c(1,2,10^5),-1) # bad behaviour > #[1] 0e+00 0e+00 1e+05 > Isn't it still a bug? > Rui Barradas > Às 15:57 de 23/09/20, Duncan Murdoch escreveu: > On 23/09/2020 6:32 a.m., Martin Keller-Ressel wrote: > Dear all, > > i have noticed some strange behaviour in the „jitter“ function in R. > On the help page for jitter it is stated that > > "The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) > and a is the amount argument (if specified).“ > > and > > "If amount is NULL (default), we set a <- factor * d/5 where d is the > smallest difference between adjacent unique (apart from fuzz) x values.“ > > This works fine as long as there is no (very) large outlier > > jitter(c(1,2,10^4)) # desired behaviour > [1] 1.083243 1.851571 9999.942716 > > But for very large outliers the added noise suddenly ‚jumps‘ to a much > larger scale: > > jitter(c(1,2,10^5)) # bad behaviour > [1] -19535.649 9578.702 115693.854 > # Noise should be of order (2-1)/5 = 0.2 but is of much larger order. > > This probably does not matter much when jitter is used for plotting, > but it can cause problems when jitter is used to break ties. > > I think this is kind of documented: "apart from fuzz" is what counts. > If you look at the code for jitter, you'll see this important line: > > d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z)))))) > > By the time you get here, z is the length of the rante of the data, so > it's 99999 in your example. The rounding changes your values to > 0,0,1e5, so the smallest difference is 1e5. > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.