Re: [R] Simulating mid-points from a defined range

Bert Gunter Fri, 06 Jun 2025 05:26:17 -0700

Richard O'Keefe pointed out that I was wrong; the distributions are in fact
different.


One (now!) "obvious" reason why is that for the order statistics, the
distribution of the max is clearly skewed. For my cumsum method, the max is
clearly uniform (on [0,55] for the specific example).

Context matters!

Cheers,
Bert







On Thu, Jun 5, 2025 at 10:09 PM Bert Gunter <bgunter.4...@gmail.com> wrote:

> ... and here is a a simple 2-liner without a sort that I think is linear
> in time and space (but please correct if this is wrong):
>
> x <- cumsum(runif(10))
> x/x[10] * runif(1, 0, 55) + seq.int(0, 45,5)
>
> Question: Does this give the same distribution as Peter's method using the
> order statistics?  I believe yes, but someone more statistically competent
> than me needs to verify or correct this.
>
> Cheers,
> Bert
>
>
>
>
> On Thu, Jun 5, 2025 at 5:19 AM Bert Gunter <bgunter.4...@gmail.com> wrote:
>
>>
>> Richard:
>>
>> "The "use an upper bound of 100 - (n+1)*5" and then "add back
>> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
>> are right but does nothing about the distribution.."
>>
>> If I understand you correctly, I think the above is wrong. Here is a
>> one-line version of Peter's code for the original problem:
>>
>> sort(runif(10, 0, 55)) + seq.int(0, 45, 5)
>>
>> AFAICS this is a bijection between the order statistics of runif(10, 0,
>> 55) and those of runif(10, 0, 100) with diffs >5. Please correct if I am
>> wrong or I have misunderstood.
>>
>> Your other remarks about space and time complexity are correct, of course.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> On Wed, Jun 4, 2025 at 11:04 PM Richard O'Keefe <rao...@gmail.com> wrote:
>>
>>> The Bentley and Saxe paper answers the questions
>>> (1) How can I draw a sample of n numbers in the range 0..1 such that
>>> the numbers are returned in increasing order WITHOUT sorting and so
>>> that all such sequences are equally likely in LINEAR time.  That's the
>>> code I showed in R.
>>> (2) How can I do this in a single pass?
>>> (3) How can I return the elements one at a time, generating each one
>>> on demand (O(1) space).
>>>
>>> The cumsum(runif(n))/n trick gets a sequence in linear time but the
>>> distribution is different.
>>> The sort(runif(n)) trick gets a sequence in O(n.log n) time but the
>>> distribution is different again.
>>>
>>> The "use an upper bound of 100 - (n+1)*5" and then "add back
>>> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
>>> are right but does nothing about the distribution..
>>>
>>> By the way, since you want numbers with 2 decimal places for some
>>> reason I don't understand,
>>> perhaps the simplest trick of all is
>>>
>>>    n <- 10 # how many numbers you want
>>>    L <- 0 # lower bound of the range
>>>    U <- 100 # upper bound of the range
>>>    G <- 5 # gap size
>>>    V <- U - G*(n+1) # reduced upper bound
>>>    x <- sort(sample((L*100):(V*100), size = n)) + cumsum(rep(G,
>>> times=n))/100
>>>
>>> I hope this is clear.
>>>
>>>
>>>
>>> On Thu, 5 Jun 2025 at 00:56, Brian Smith <briansmith199...@gmail.com>
>>> wrote:
>>> >
>>> > Okay, I think I found the reason. This is due to accumulation of nine
>>> > 5s in the cumsum. Thanks again for the elegant solution.
>>> >
>>> > But I wonder, if the solution is simple then what is the significance
>>> > of the Research paper by Bentley and Saxe naming “Generating sorted
>>> > lists of random numbers” which Richard mentioned?
>>> >
>>> > On Wed, 4 Jun 2025 at 17:54, Brian Smith <briansmith199...@gmail.com>
>>> wrote:
>>> > >
>>> > > Hi Peter,
>>> > >
>>> > > Could you please help me to understand what is the basis of choosing
>>> > > 55 in runif(10,0,55))?
>>> > >
>>> > > Thank you!
>>> > >
>>> > > On Wed, 4 Jun 2025 at 02:45, peter dalgaard <pda...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > Can't you just generate 10 values in (0,55), sort them, generate
>>> the distances, add 5 and cumulate?
>>> > > >
>>> > > > > x <- sort(runif(10,0,55))
>>> > > > > d <- diff(x)+5
>>> > > > > cumsum(c(x[1],d))
>>> > > >  [1] 12.27815 21.21060 26.37856 36.03812 41.97237 57.02945 67.86113
>>> > > >  [8] 75.74085 81.28533 98.30792
>>> > > >
>>> > > >
>>> > > > > On 3 Jun 2025, at 09.21, Brian Smith <briansmith199...@gmail.com>
>>> wrote:
>>> > > > >
>>> > > > > Hi Richard,
>>> > > > >
>>> > > > > Thanks for your insight.
>>> > > > >
>>> > > > > As I mentioned in one of my earlier emails to the group, I
>>> imposed a
>>> > > > > constraint of accuracy up to two decimal places in order to
>>> obtain a
>>> > > > > finite set of possible values. For instance, if I were to round
>>> values
>>> > > > > to zero decimal places, the number of unique sequences that
>>> could be
>>> > > > > generated would be strictly finite and quite limited. Therefore,
>>> I
>>> > > > > chose a precision of two decimal places to allow for a larger but
>>> > > > > still finite number of possibilities.
>>> > > > >
>>> > > > >
>>> > > > > Now, my question is: how can this accuracy constraint be imposed
>>> effectively?
>>> > > > >
>>> > > > > Is the only practical method to generate samples, round each to
>>> two
>>> > > > > decimal places, and then check for duplicates to ensure
>>> uniqueness? If
>>> > > > > so, I’m concerned this might be inefficient, as many samples
>>> could be
>>> > > > > discarded, making the process time-consuming.
>>> > > > >
>>> > > > > Is there a better or more efficient way to directly enforce this
>>> > > > > constraint while generating the values?
>>> > > > >
>>> > > > > ________________________________
>>> > > > >
>>> > > > > Additionally, could you please elaborate on your suggestion
>>> regarding
>>> > > > > imposing minimum gap constraints by subtracting and then adding
>>> back
>>> > > > > certain gaps?
>>> > > > >
>>> > > > >
>>> > > > > For example, based on your earlier guidance, one possible
>>> sequence I
>>> > > > > obtained is:
>>> > > > >
>>> > > > >
>>> > > > > 10.07181, 14.49839, 14.74435, 18.75167, 42.70361, 55.79623,
>>> 63.40264,
>>> > > > > 68.62261, 92.49899, 98.29308
>>> > > > >
>>> > > > >
>>> > > > > Now, I’d like to post-process this sequence to enforce a minimum
>>> > > > > difference constraint of, say, 5 units between values (including
>>> both
>>> > > > > lower and upper bounds).
>>> > > > >
>>> > > > > What would be the appropriate way to modify the sequence to
>>> impose
>>> > > > > this kind of constraint?
>>> > > > >
>>> > > > >
>>> > > > > Many thanks for your time and insight.
>>> > > > >
>>> > > > > On Tue, 3 Jun 2025 at 10:42, Richard O'Keefe <rao...@gmail.com>
>>> wrote:
>>> > > > >>
>>> > > > >> PS I forgot about the weird gaps requirement.
>>> > > > >> What you do is subtract the gaps off and then add them back.  I
>>> hope that is clear.
>>> > > > >>
>>> > > > >> On Sun, 1 Jun 2025 at 6:52 AM, Brian Smith <
>>> briansmith199...@gmail.com> wrote:
>>> > > > >>>
>>> > > > >>> Hi,
>>> > > > >>>
>>> > > > >>> Let say I have a range [0, 100]
>>> > > > >>>
>>> > > > >>> Now I need to simulate 1000 10 mid-points within the range with
>>> > > > >>> accuracy upto second decimal number.
>>> > > > >>>
>>> > > > >>> Let say, one simulated set is
>>> > > > >>>
>>> > > > >>> X1, X2, ..., X10
>>> > > > >>>
>>> > > > >>> Ofcourrse
>>> > > > >>>
>>> > > > >>> X1 < X2 < ... <X10
>>> > > > >>>
>>> > > > >>> I have one more constraint that the difference between any 2
>>> > > > >>> consecutive mid-points shall be at-least 5.00.
>>> > > > >>>
>>> > > > >>> I wonder if there is any Statistical theory available to
>>> support this
>>> > > > >>> kind of simulation.
>>> > > > >>>
>>> > > > >>> Alternately, is there any way in R to implement this?
>>> > > > >>>
>>> > > > >>> ______________________________________________
>>> > > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> see
>>> > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > >>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> > > > >>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>> > > > >
>>> > > > > ______________________________________________
>>> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> see
>>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > > PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> > > > > and provide commented, minimal, self-contained, reproducible
>>> code.
>>> > > >
>>> > > > --
>>> > > > Peter Dalgaard, Professor,
>>> > > > Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3,
>>> 2000 Frederiksberg, Denmark
>>> > > > Phone: (+45)38153501
>>> > > > Office: A 4.23
>>> > > > Email: pd....@cbs.dk  Priv: pda...@gmail.com
>>> > > >
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simulating mid-points from a defined range

Reply via email to