Richard:

"The "use an upper bound of 100 - (n+1)*5" and then "add back
cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
are right but does nothing about the distribution.."

If I understand you correctly, I think the above is wrong. Here is a
one-line version of Peter's code for the original problem:

sort(runif(10, 0, 55)) + seq.int(0, 45, 5)

AFAICS this is a bijection between the order statistics of runif(10, 0, 55)
and those of runif(10, 0, 100) with diffs >5. Please correct if I am wrong
or I have misunderstood.

Your other remarks about space and time complexity are correct, of course.

Cheers,
Bert




On Wed, Jun 4, 2025 at 11:04 PM Richard O'Keefe <rao...@gmail.com> wrote:

> The Bentley and Saxe paper answers the questions
> (1) How can I draw a sample of n numbers in the range 0..1 such that
> the numbers are returned in increasing order WITHOUT sorting and so
> that all such sequences are equally likely in LINEAR time.  That's the
> code I showed in R.
> (2) How can I do this in a single pass?
> (3) How can I return the elements one at a time, generating each one
> on demand (O(1) space).
>
> The cumsum(runif(n))/n trick gets a sequence in linear time but the
> distribution is different.
> The sort(runif(n)) trick gets a sequence in O(n.log n) time but the
> distribution is different again.
>
> The "use an upper bound of 100 - (n+1)*5" and then "add back
> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps
> are right but does nothing about the distribution..
>
> By the way, since you want numbers with 2 decimal places for some
> reason I don't understand,
> perhaps the simplest trick of all is
>
>    n <- 10 # how many numbers you want
>    L <- 0 # lower bound of the range
>    U <- 100 # upper bound of the range
>    G <- 5 # gap size
>    V <- U - G*(n+1) # reduced upper bound
>    x <- sort(sample((L*100):(V*100), size = n)) + cumsum(rep(G,
> times=n))/100
>
> I hope this is clear.
>
>
>
> On Thu, 5 Jun 2025 at 00:56, Brian Smith <briansmith199...@gmail.com>
> wrote:
> >
> > Okay, I think I found the reason. This is due to accumulation of nine
> > 5s in the cumsum. Thanks again for the elegant solution.
> >
> > But I wonder, if the solution is simple then what is the significance
> > of the Research paper by Bentley and Saxe naming “Generating sorted
> > lists of random numbers” which Richard mentioned?
> >
> > On Wed, 4 Jun 2025 at 17:54, Brian Smith <briansmith199...@gmail.com>
> wrote:
> > >
> > > Hi Peter,
> > >
> > > Could you please help me to understand what is the basis of choosing
> > > 55 in runif(10,0,55))?
> > >
> > > Thank you!
> > >
> > > On Wed, 4 Jun 2025 at 02:45, peter dalgaard <pda...@gmail.com> wrote:
> > > >
> > > > Can't you just generate 10 values in (0,55), sort them, generate the
> distances, add 5 and cumulate?
> > > >
> > > > > x <- sort(runif(10,0,55))
> > > > > d <- diff(x)+5
> > > > > cumsum(c(x[1],d))
> > > >  [1] 12.27815 21.21060 26.37856 36.03812 41.97237 57.02945 67.86113
> > > >  [8] 75.74085 81.28533 98.30792
> > > >
> > > >
> > > > > On 3 Jun 2025, at 09.21, Brian Smith <briansmith199...@gmail.com>
> wrote:
> > > > >
> > > > > Hi Richard,
> > > > >
> > > > > Thanks for your insight.
> > > > >
> > > > > As I mentioned in one of my earlier emails to the group, I imposed
> a
> > > > > constraint of accuracy up to two decimal places in order to obtain
> a
> > > > > finite set of possible values. For instance, if I were to round
> values
> > > > > to zero decimal places, the number of unique sequences that could
> be
> > > > > generated would be strictly finite and quite limited. Therefore, I
> > > > > chose a precision of two decimal places to allow for a larger but
> > > > > still finite number of possibilities.
> > > > >
> > > > >
> > > > > Now, my question is: how can this accuracy constraint be imposed
> effectively?
> > > > >
> > > > > Is the only practical method to generate samples, round each to two
> > > > > decimal places, and then check for duplicates to ensure
> uniqueness? If
> > > > > so, I’m concerned this might be inefficient, as many samples could
> be
> > > > > discarded, making the process time-consuming.
> > > > >
> > > > > Is there a better or more efficient way to directly enforce this
> > > > > constraint while generating the values?
> > > > >
> > > > > ________________________________
> > > > >
> > > > > Additionally, could you please elaborate on your suggestion
> regarding
> > > > > imposing minimum gap constraints by subtracting and then adding
> back
> > > > > certain gaps?
> > > > >
> > > > >
> > > > > For example, based on your earlier guidance, one possible sequence
> I
> > > > > obtained is:
> > > > >
> > > > >
> > > > > 10.07181, 14.49839, 14.74435, 18.75167, 42.70361, 55.79623,
> 63.40264,
> > > > > 68.62261, 92.49899, 98.29308
> > > > >
> > > > >
> > > > > Now, I’d like to post-process this sequence to enforce a minimum
> > > > > difference constraint of, say, 5 units between values (including
> both
> > > > > lower and upper bounds).
> > > > >
> > > > > What would be the appropriate way to modify the sequence to impose
> > > > > this kind of constraint?
> > > > >
> > > > >
> > > > > Many thanks for your time and insight.
> > > > >
> > > > > On Tue, 3 Jun 2025 at 10:42, Richard O'Keefe <rao...@gmail.com>
> wrote:
> > > > >>
> > > > >> PS I forgot about the weird gaps requirement.
> > > > >> What you do is subtract the gaps off and then add them back.  I
> hope that is clear.
> > > > >>
> > > > >> On Sun, 1 Jun 2025 at 6:52 AM, Brian Smith <
> briansmith199...@gmail.com> wrote:
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> Let say I have a range [0, 100]
> > > > >>>
> > > > >>> Now I need to simulate 1000 10 mid-points within the range with
> > > > >>> accuracy upto second decimal number.
> > > > >>>
> > > > >>> Let say, one simulated set is
> > > > >>>
> > > > >>> X1, X2, ..., X10
> > > > >>>
> > > > >>> Ofcourrse
> > > > >>>
> > > > >>> X1 < X2 < ... <X10
> > > > >>>
> > > > >>> I have one more constraint that the difference between any 2
> > > > >>> consecutive mid-points shall be at-least 5.00.
> > > > >>>
> > > > >>> I wonder if there is any Statistical theory available to support
> this
> > > > >>> kind of simulation.
> > > > >>>
> > > > >>> Alternately, is there any way in R to implement this?
> > > > >>>
> > > > >>> ______________________________________________
> > > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >>> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> > > > >>> and provide commented, minimal, self-contained, reproducible
> code.
> > > > >
> > > > > ______________________________________________
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > > --
> > > > Peter Dalgaard, Professor,
> > > > Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3,
> 2000 Frederiksberg, Denmark
> > > > Phone: (+45)38153501
> > > > Office: A 4.23
> > > > Email: pd....@cbs.dk  Priv: pda...@gmail.com
> > > >
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to