Re: [Rd] A different error in sample()

Emil Bode Thu, 20 Sep 2018 01:21:20 -0700

But do we handle it as an error in what sample does, or how the documentation 
is?
I think what is done now would be best described as "ceilinged", i.e. what 
ceiling() does. But is there an English word to describe this?
Or just use "converted to the next smallest integer"?


But then again, what happens is that the answer is ceilinged, not the input.
I guess the rationale is that multiplying by any integer and then dividing 
should give the same results:
ceiling(sample(n * x, size=1e6, replace = TRUE) / x) gives the same results for 
any integer n and x, it's nice that this also holds for non-integer n.
The most important thing is why people would use sample with a non-integer x, I 
don’t see many use cases.
So I agree with Luke that a warning would be best, regardless of what the docs 
say.

Best regards, 
Emil Bode

    Although it seems to be pretty weird to enter a numeric vector of length 
one that is not an integer as the first argument to sample(), the results do 
not seem to match what is documented in the manual. In addition, the results 
below do not support the use of round rather than truncate in the 
documentation. Consider the code below.
    The first sentence in the details section says: "If x has length 1, is 
numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes 
place from 1:x."
    In the console:> 1:2.001
    [1] 1 2
    > 1:2.9
    [1] 1 2
    
    truncation:
    > trunc(2.9)
    [1] 2
    
    So, this seems to support the quote from in previous emails: "Non-integer 
positive numerical values of n or x will be truncated to the next smallest 
integer, which has to be no larger than .Machine$integer.max."
    However, again in the console:> set.seed(123)
     > table(sample(2.001, 10000, replace=TRUE))
    
       1    2    3 
    5052 4941    7
    
    So, neither rounding nor truncation is occurring. Next, define a sequence.
    > x <- seq(2.001, 2.51, length.out=20)
    Now, grab all of the threes from sample()-ing this sequence.
    
     > set.seed(123)
    > threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
    
    Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
    [1] FALSE
    Now, the (to me) disturbing result.
    
    > is.unsorted(threes)
    [1] FALSE
    
    or equivalently
    
    > all(diff(threes) > 0)
    [1] TRUE
    
    So the number of threes grows monotonically as 2.001 moves to 2.5. As I 
hinted above, the monotonic growth is not assured. My guess is that the growth 
is stochastic and relates to some "probability weighting" based on how close 
the element of x is to 3. Perhaps this has been brought up before, but it seems 
relevant to the current discussion.
    A potential aid to this issue would be something like
    if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad 
idea to use vectors of length 1 in the x argument that are not integers.")
    Hope that helps,luke
    
        [[alternative HTML version deleted]]
    
    ______________________________________________
    R-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
    

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A different error in sample()

Reply via email to