... and here is a a simple 2-liner without a sort that I think is linear in time and space (but please correct if this is wrong):
x <- cumsum(runif(10)) x/x[10] * runif(1, 0, 55) + seq.int(0, 45,5) Question: Does this give the same distribution as Peter's method using the order statistics? I believe yes, but someone more statistically competent than me needs to verify or correct this. Cheers, Bert On Thu, Jun 5, 2025 at 5:19 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > > Richard: > > "The "use an upper bound of 100 - (n+1)*5" and then "add back > cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps > are right but does nothing about the distribution.." > > If I understand you correctly, I think the above is wrong. Here is a > one-line version of Peter's code for the original problem: > > sort(runif(10, 0, 55)) + seq.int(0, 45, 5) > > AFAICS this is a bijection between the order statistics of runif(10, 0, > 55) and those of runif(10, 0, 100) with diffs >5. Please correct if I am > wrong or I have misunderstood. > > Your other remarks about space and time complexity are correct, of course. > > Cheers, > Bert > > > > > On Wed, Jun 4, 2025 at 11:04 PM Richard O'Keefe <rao...@gmail.com> wrote: > >> The Bentley and Saxe paper answers the questions >> (1) How can I draw a sample of n numbers in the range 0..1 such that >> the numbers are returned in increasing order WITHOUT sorting and so >> that all such sequences are equally likely in LINEAR time. That's the >> code I showed in R. >> (2) How can I do this in a single pass? >> (3) How can I return the elements one at a time, generating each one >> on demand (O(1) space). >> >> The cumsum(runif(n))/n trick gets a sequence in linear time but the >> distribution is different. >> The sort(runif(n)) trick gets a sequence in O(n.log n) time but the >> distribution is different again. >> >> The "use an upper bound of 100 - (n+1)*5" and then "add back >> cumsum(rep(5,n)) at the end" (or equivalent) trick ensures the gaps >> are right but does nothing about the distribution.. >> >> By the way, since you want numbers with 2 decimal places for some >> reason I don't understand, >> perhaps the simplest trick of all is >> >> n <- 10 # how many numbers you want >> L <- 0 # lower bound of the range >> U <- 100 # upper bound of the range >> G <- 5 # gap size >> V <- U - G*(n+1) # reduced upper bound >> x <- sort(sample((L*100):(V*100), size = n)) + cumsum(rep(G, >> times=n))/100 >> >> I hope this is clear. >> >> >> >> On Thu, 5 Jun 2025 at 00:56, Brian Smith <briansmith199...@gmail.com> >> wrote: >> > >> > Okay, I think I found the reason. This is due to accumulation of nine >> > 5s in the cumsum. Thanks again for the elegant solution. >> > >> > But I wonder, if the solution is simple then what is the significance >> > of the Research paper by Bentley and Saxe naming “Generating sorted >> > lists of random numbers” which Richard mentioned? >> > >> > On Wed, 4 Jun 2025 at 17:54, Brian Smith <briansmith199...@gmail.com> >> wrote: >> > > >> > > Hi Peter, >> > > >> > > Could you please help me to understand what is the basis of choosing >> > > 55 in runif(10,0,55))? >> > > >> > > Thank you! >> > > >> > > On Wed, 4 Jun 2025 at 02:45, peter dalgaard <pda...@gmail.com> wrote: >> > > > >> > > > Can't you just generate 10 values in (0,55), sort them, generate >> the distances, add 5 and cumulate? >> > > > >> > > > > x <- sort(runif(10,0,55)) >> > > > > d <- diff(x)+5 >> > > > > cumsum(c(x[1],d)) >> > > > [1] 12.27815 21.21060 26.37856 36.03812 41.97237 57.02945 67.86113 >> > > > [8] 75.74085 81.28533 98.30792 >> > > > >> > > > >> > > > > On 3 Jun 2025, at 09.21, Brian Smith <briansmith199...@gmail.com> >> wrote: >> > > > > >> > > > > Hi Richard, >> > > > > >> > > > > Thanks for your insight. >> > > > > >> > > > > As I mentioned in one of my earlier emails to the group, I >> imposed a >> > > > > constraint of accuracy up to two decimal places in order to >> obtain a >> > > > > finite set of possible values. For instance, if I were to round >> values >> > > > > to zero decimal places, the number of unique sequences that could >> be >> > > > > generated would be strictly finite and quite limited. Therefore, I >> > > > > chose a precision of two decimal places to allow for a larger but >> > > > > still finite number of possibilities. >> > > > > >> > > > > >> > > > > Now, my question is: how can this accuracy constraint be imposed >> effectively? >> > > > > >> > > > > Is the only practical method to generate samples, round each to >> two >> > > > > decimal places, and then check for duplicates to ensure >> uniqueness? If >> > > > > so, I’m concerned this might be inefficient, as many samples >> could be >> > > > > discarded, making the process time-consuming. >> > > > > >> > > > > Is there a better or more efficient way to directly enforce this >> > > > > constraint while generating the values? >> > > > > >> > > > > ________________________________ >> > > > > >> > > > > Additionally, could you please elaborate on your suggestion >> regarding >> > > > > imposing minimum gap constraints by subtracting and then adding >> back >> > > > > certain gaps? >> > > > > >> > > > > >> > > > > For example, based on your earlier guidance, one possible >> sequence I >> > > > > obtained is: >> > > > > >> > > > > >> > > > > 10.07181, 14.49839, 14.74435, 18.75167, 42.70361, 55.79623, >> 63.40264, >> > > > > 68.62261, 92.49899, 98.29308 >> > > > > >> > > > > >> > > > > Now, I’d like to post-process this sequence to enforce a minimum >> > > > > difference constraint of, say, 5 units between values (including >> both >> > > > > lower and upper bounds). >> > > > > >> > > > > What would be the appropriate way to modify the sequence to impose >> > > > > this kind of constraint? >> > > > > >> > > > > >> > > > > Many thanks for your time and insight. >> > > > > >> > > > > On Tue, 3 Jun 2025 at 10:42, Richard O'Keefe <rao...@gmail.com> >> wrote: >> > > > >> >> > > > >> PS I forgot about the weird gaps requirement. >> > > > >> What you do is subtract the gaps off and then add them back. I >> hope that is clear. >> > > > >> >> > > > >> On Sun, 1 Jun 2025 at 6:52 AM, Brian Smith < >> briansmith199...@gmail.com> wrote: >> > > > >>> >> > > > >>> Hi, >> > > > >>> >> > > > >>> Let say I have a range [0, 100] >> > > > >>> >> > > > >>> Now I need to simulate 1000 10 mid-points within the range with >> > > > >>> accuracy upto second decimal number. >> > > > >>> >> > > > >>> Let say, one simulated set is >> > > > >>> >> > > > >>> X1, X2, ..., X10 >> > > > >>> >> > > > >>> Ofcourrse >> > > > >>> >> > > > >>> X1 < X2 < ... <X10 >> > > > >>> >> > > > >>> I have one more constraint that the difference between any 2 >> > > > >>> consecutive mid-points shall be at-least 5.00. >> > > > >>> >> > > > >>> I wonder if there is any Statistical theory available to >> support this >> > > > >>> kind of simulation. >> > > > >>> >> > > > >>> Alternately, is there any way in R to implement this? >> > > > >>> >> > > > >>> ______________________________________________ >> > > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >> see >> > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > > > >>> PLEASE do read the posting guide >> https://www.R-project.org/posting-guide.html >> > > > >>> and provide commented, minimal, self-contained, reproducible >> code. >> > > > > >> > > > > ______________________________________________ >> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > > > PLEASE do read the posting guide >> https://www.R-project.org/posting-guide.html >> > > > > and provide commented, minimal, self-contained, reproducible code. >> > > > >> > > > -- >> > > > Peter Dalgaard, Professor, >> > > > Center for Statistics, Copenhagen Business SchoolSolbjerg Plads 3, >> 2000 Frederiksberg, Denmark >> > > > Phone: (+45)38153501 >> > > > Office: A 4.23 >> > > > Email: pd....@cbs.dk Priv: pda...@gmail.com >> > > > >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> https://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.