Richard: "The "use an upper bound of 100 - (n+1)*5" and then "add back cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps are right but does nothing about the distribution.."
If I understand you correctly, I think the above is wrong. Here is a one-line version of Peter's code for the original problem: sort(runif(10, 0, 55)) + seq.int(0, 45, 5) AFAICS this is a bijection between the order statistics of runif(10, 0, 55) and those of runif(10, 0, 100) with diffs >5. Please correct if I am wrong or I have misunderstood. Your other remarks about space and time complexity are correct, of course. Cheers, Bert On Wed, Jun 4, 2025 at 11:04 PM Richard O'Keefe <rao...@gmail.com> wrote: > The Bentley and Saxe paper answers the questions > (1) How can I draw a sample of n numbers in the range 0..1 such that > the numbers are returned in increasing order WITHOUT sorting and so > that all such sequences are equally likely in LINEAR time. That's the > code I showed in R. > (2) How can I do this in a single pass? > (3) How can I return the elements one at a time, generating each one > on demand (O(1) space). > > The cumsum(runif(n))/n trick gets a sequence in linear time but the > distribution is different. > The sort(runif(n)) trick gets a sequence in O(n.log n) time but the > distribution is different again. > > The "use an upper bound of 100 - (n+1)*5" and then "add back > cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps > are right but does nothing about the distribution.. > > By the way, since you want numbers with 2 decimal places for some > reason I don't understand, > perhaps the simplest trick of all is > > n <- 10 # how many numbers you want > L <- 0 # lower bound of the range > U <- 100 # upper bound of the range > G <- 5 # gap size > V <- U - G*(n+1) # reduced upper bound > x <- sort(sample((L*100):(V*100), size = n)) + cumsum(rep(G, > times=n))/100 > > I hope this is clear. > > > > On Thu, 5 Jun 2025 at 00:56, Brian Smith <briansmith199...@gmail.com> > wrote: > > > > Okay, I think I found the reason. This is due to accumulation of nine > > 5s in the cumsum. Thanks again for the elegant solution. > > > > But I wonder, if the solution is simple then what is the significance > > of the Research paper by Bentley and Saxe naming “Generating sorted > > lists of random numbers” which Richard mentioned? > > > > On Wed, 4 Jun 2025 at 17:54, Brian Smith <briansmith199...@gmail.com> > wrote: > > > > > > Hi Peter, > > > > > > Could you please help me to understand what is the basis of choosing > > > 55 in runif(10,0,55))? > > > > > > Thank you! > > > > > > On Wed, 4 Jun 2025 at 02:45, peter dalgaard <pda...@gmail.com> wrote: > > > > > > > > Can't you just generate 10 values in (0,55), sort them, generate the > distances, add 5 and cumulate? > > > > > > > > > x <- sort(runif(10,0,55)) > > > > > d <- diff(x)+5 > > > > > cumsum(c(x[1],d)) > > > > [1] 12.27815 21.21060 26.37856 36.03812 41.97237 57.02945 67.86113 > > > > [8] 75.74085 81.28533 98.30792 > > > > > > > > > > > > > On 3 Jun 2025, at 09.21, Brian Smith <briansmith199...@gmail.com> > wrote: > > > > > > > > > > Hi Richard, > > > > > > > > > > Thanks for your insight. > > > > > > > > > > As I mentioned in one of my earlier emails to the group, I imposed > a > > > > > constraint of accuracy up to two decimal places in order to obtain > a > > > > > finite set of possible values. For instance, if I were to round > values > > > > > to zero decimal places, the number of unique sequences that could > be > > > > > generated would be strictly finite and quite limited. Therefore, I > > > > > chose a precision of two decimal places to allow for a larger but > > > > > still finite number of possibilities. > > > > > > > > > > > > > > > Now, my question is: how can this accuracy constraint be imposed > effectively? > > > > > > > > > > Is the only practical method to generate samples, round each to two > > > > > decimal places, and then check for duplicates to ensure > uniqueness? If > > > > > so, I’m concerned this might be inefficient, as many samples could > be > > > > > discarded, making the process time-consuming. > > > > > > > > > > Is there a better or more efficient way to directly enforce this > > > > > constraint while generating the values? > > > > > > > > > > ________________________________ > > > > > > > > > > Additionally, could you please elaborate on your suggestion > regarding > > > > > imposing minimum gap constraints by subtracting and then adding > back > > > > > certain gaps? > > > > > > > > > > > > > > > For example, based on your earlier guidance, one possible sequence > I > > > > > obtained is: > > > > > > > > > > > > > > > 10.07181, 14.49839, 14.74435, 18.75167, 42.70361, 55.79623, > 63.40264, > > > > > 68.62261, 92.49899, 98.29308 > > > > > > > > > > > > > > > Now, I’d like to post-process this sequence to enforce a minimum > > > > > difference constraint of, say, 5 units between values (including > both > > > > > lower and upper bounds). > > > > > > > > > > What would be the appropriate way to modify the sequence to impose > > > > > this kind of constraint? > > > > > > > > > > > > > > > Many thanks for your time and insight. > > > > > > > > > > On Tue, 3 Jun 2025 at 10:42, Richard O'Keefe <rao...@gmail.com> > wrote: > > > > >> > > > > >> PS I forgot about the weird gaps requirement. > > > > >> What you do is subtract the gaps off and then add them back. I > hope that is clear. > > > > >> > > > > >> On Sun, 1 Jun 2025 at 6:52 AM, Brian Smith < > briansmith199...@gmail.com> wrote: > > > > >>> > > > > >>> Hi, > > > > >>> > > > > >>> Let say I have a range [0, 100] > > > > >>> > > > > >>> Now I need to simulate 1000 10 mid-points within the range with > > > > >>> accuracy upto second decimal number. > > > > >>> > > > > >>> Let say, one simulated set is > > > > >>> > > > > >>> X1, X2, ..., X10 > > > > >>> > > > > >>> Ofcourrse > > > > >>> > > > > >>> X1 < X2 < ... <X10 > > > > >>> > > > > >>> I have one more constraint that the difference between any 2 > > > > >>> consecutive mid-points shall be at-least 5.00. > > > > >>> > > > > >>> I wonder if there is any Statistical theory available to support > this > > > > >>> kind of simulation. > > > > >>> > > > > >>> Alternately, is there any way in R to implement this? > > > > >>> > > > > >>> ______________________________________________ > > > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > > > >>> PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > > > > >>> and provide commented, minimal, self-contained, reproducible > code. > > > > > > > > > > ______________________________________________ > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > > > Peter Dalgaard, Professor, > > > > Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3, > 2000 Frederiksberg, Denmark > > > > Phone: (+45)38153501 > > > > Office: A 4.23 > > > > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.