Yup, that is a bug, at least in the documentation. Probably a clearer example is
x <- seq(2.001, 2.999, length.out=999) threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3]) plot(threes, type="l") curve(10000*(x-2)/x, add=TRUE, col="red") which is entirely consistent with what you'd expect from floor(runif(10000, 0, y)) + 1, and as far as I can tell from the source, that is what is happening internally. (Strict monotonicity is a bit of a red herring, it is jut a matter of having spaced the y so far apart that the probability of an order reversal becomes negligible.) So either we should do what the documentation says we do, or the documentation should not say that we do what we do not actually do... The suspect code is this snippet from do_sample: int n = (int) dn; ..... if (replace || k < 2) { for (int i = 0; i < k; i++) iy[i] = (int)(R_unif_index(dn) + 1); } else { int *x = (int *)R_alloc(n, sizeof(int)); for (int i = 0; i < n; i++) x[i] = i; for (int i = 0; i < k; i++) { int j = (int)(R_unif_index(n)); iy[i] = x[j] + 1; x[j] = x[--n]; } } (notice arguments to R_unif_index) -pd > On 20 Sep 2018, at 01:53 , lmo via R-devel <r-devel@r-project.org> wrote: > > Although it seems to be pretty weird to enter a numeric vector of length one > that is not an integer as the first argument to sample(), the results do not > seem to match what is documented in the manual. In addition, the results > below do not support the use of round rather than truncate in the > documentation. Consider the code below. > The first sentence in the details section says: "If x has length 1, is > numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes > place from 1:x." > In the console:> 1:2.001 > [1] 1 2 >> 1:2.9 > [1] 1 2 > > truncation: >> trunc(2.9) > [1] 2 > > So, this seems to support the quote from in previous emails: "Non-integer > positive numerical values of n or x will be truncated to the next smallest > integer, which has to be no larger than .Machine$integer.max." > However, again in the console:> set.seed(123) >> table(sample(2.001, 10000, replace=TRUE)) > > 1 2 3 > 5052 4941 7 > > So, neither rounding nor truncation is occurring. Next, define a sequence. >> x <- seq(2.001, 2.51, length.out=20) > Now, grab all of the threes from sample()-ing this sequence. > >> set.seed(123) >> threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3]) > > Check for NAs (I cheated here and found a nice seed).> any(is.na(threes)) > [1] FALSE > Now, the (to me) disturbing result. > >> is.unsorted(threes) > [1] FALSE > > or equivalently > >> all(diff(threes) > 0) > [1] TRUE > > So the number of threes grows monotonically as 2.001 moves to 2.5. As I > hinted above, the monotonic growth is not assured. My guess is that the > growth is stochastic and relates to some "probability weighting" based on how > close the element of x is to 3. Perhaps this has been brought up before, but > it seems relevant to the current discussion. > A potential aid to this issue would be something like > if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad idea > to use vectors of length 1 in the x argument that are not integers.") > Hope that helps,luke > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel